chill_notes/Prompts/Speaker Auto-Tag Prompt.md at b97d594272495bd45aca4e2c81e0c5d76b17902a

Files

FNS Service 6354da37c1 Update from Sync Service

2026-04-27 23:13:16 +08:00

13 KiB

Executable File

Raw Blame History

Performance-optimized version of Speaker Auto-Tag Prompt with batched People note resolution and context-based confidence refinement.

Self-contained prompt for tagging unidentified speakers in one transcript stub using vault data (People notes, calendar attendees). Returns proposed speaker mappings; the caller handles review and writes.

CRITICAL: Never read the full transcript in a single Read call. Transcripts exceed the 10k token file-read limit. Use chunked reads (limit=500 per chunk). See Step 2.

OPTIMIZATION FOCUS: Read People notes deterministically from the parent meeting invitees list (no broad name-based glob), batch all invitee reads in parallel, build a comprehensive context table before analysis, and use role/context matching for confidence refinement.

Step 1: Setup

If Read, Glob, Grep, and Bash are not already available, load them: ToolSearch("select:Read,Glob,Grep,Bash")

Extract from the calling message:

Transcript — path to the transcript stub file
Microphone user — full name of the person at the microphone
Calendar Attendees — pre-fetched attendee context (optional; skips Step 4 if provided)
People Roster — pre-built People context table (optional; skips Step 3 if provided — use as the People context table directly)
People Folder — folder name for People notes (default: People)
Output format — expected output format for the mapping

Path resolution: try the Transcript path as-is. If not found, error with the path.

Vault root derivation: extract vault root from the transcript path by removing the transcript folder suffix. Use this root combined with People Folder for all lookups.

Derived values (from filename, no read needed):

MEETING_TITLE — strip date prefix (first 13 chars for "YYYY-MM-DD - " or first 11 chars for "YYYY-MM-DD ") and " - Transcript" suffix
SANITIZED_SUBJECT — take MEETING_TITLE, replace /, , : with hyphens. Used for cache Glob in Step 2.

Step 2: Read Transcript + Glob Roster Cache + Glob People Notes

Round 1 — issue in a single parallel batch:

Read(transcript_path, limit=60) — frontmatter only
Bash: wc -l transcript_path — total line count
Glob("Caches/Speaker Rosters/SANITIZED_SUBJECT.md") — roster cache for Step 5

Extract from frontmatter:

session_id (or macwhisper_session_id for WhisperCal stubs)
speakers — array with name, id, stub flag, line_count per speaker
meeting_note — wiki-link to parent meeting note (used in Step 3a)
calendar_event — event title, "none", or absent
calendar_attendees — array of plain string names (or invitees for WhisperCal, or meeting_invitees for wiki-link format like "- Name")
meeting_subject — used to validate cache match
is_recurring — enables roster cache path
pipeline_state — current pipeline state
tags — detect WhisperCal stubs (tags: [transcript])

Pipeline state: any value (tagged, extracted, summarized, titled) or absent means proceed.

Round 2 — transcript body chunks in a single parallel batch. Use the line count from Round 1. Read in chunks of 500 lines, starting after the frontmatter:

Read(transcript_path, offset=61, limit=500) — chunk 1
Read(transcript_path, offset=561, limit=500) — chunk 2
Read(transcript_path, offset=1061, limit=500) — chunk 3 if needed
Continue until total lines are covered.

Most transcripts need 1 to 3 chunks. The full body across all chunks is the primary input for Step 6.

Step 3: Read Invitees from Parent Meeting Note (OPTIMIZED)

Skip this entire step if People Roster was provided in the invocation parameters. Use the provided roster as the People context table directly and proceed to Step 4.

CRITICAL: If not skipped, do NOT skip this step. Complete it before Step 4 to maximize cache hits and build context early.

3a. Read Parent Meeting Note

The transcript frontmatter contains meeting_note field with a wiki-link to the parent meeting note (e.g., [[2026-04-04 - Test Transcript]]).

Extract the meeting note path from the wiki-link
Read the meeting note (limit=100 to capture frontmatter + invitees section)
Extract the meeting_invitees field — this contains an array of People note wiki-links or plain attendee names

Example invitees field:

meeting_invitees:
  - "[[People/Joe Jackson]]"
  - "[[People/Tanner Bragg]]"
  - "[[People/Gregory Porter]]"
  - "[[People/Andrew Davis]]"

Or plain names:

meeting_invitees:
  - Joe Jackson
  - Tanner Bragg
  - Gregory Porter

3b. Parse Invitees & Build File Paths (DETERMINISTIC)

For each invitee:

If wiki-link format [[People/Name]]: extract the path as People/Name.md
If plain name: construct path as People/{name}.md

Issue ALL Read calls in a single parallel batch. This is deterministic — no glob phase.

Example (if 4 invitees):

Read(People/Joe Jackson.md, limit=60)
Read(People/Tanner Bragg.md, limit=60)
Read(People/Gregory Porter.md, limit=60)
Read(People/Andrew Davis.md, limit=60)

3c. Build People Context Table (Phase 2)

From each People note read in Phase 3b, extract:

full_name
nickname (if present)
role_title
company/org

Build a Context value per person as:

"role_title, company/org" (both present)
"role_title" (org empty)
"company/org" (role empty)
"" (both empty)

Source values: "meeting_invitee", "calendar", "microphone_user", "vocative_recovery"

This table is the foundation for Step 6 confidence refinement. Do not skip to Step 4 until this is complete.

3d. Roster Cache Merge (If Cache Hit)

If a cache was found in Step 2:

Compare table from Phase 3c against cached roster
If all invitees are present in cache: use cached full context
If new names found: merge Phase 3c results into cache, update cache generated date

Step 4: Calendar Attendees

Skip if Calendar Attendees was provided in invocation (use it directly). Also skip if People Roster was provided.

calendar_event is "none" — skip calendar lookups entirely (ad-hoc meeting)
calendar_attendees, invitees, or meeting_invitees populated — use directly (strip wrappers if present)
No calendar data — proceed without. Calendar context improves confidence but is not required.

Add all calendar attendees to the People context table from Step 3c with Source="calendar".

Step 5: Vocative-to-Speaker Mapping

5a. Vocative Scanning (Pre-Step 6)

Scan the full transcript body (from Step 2) for direct address patterns:

"[Name], go ahead"
"Thanks, [Name]"
"[Name], what do you think?"
"[Name], can you..."
"Over to you, [Name]"
"[Name]?" (standalone calling)

5b. Match Against People Context

For each vocative detected, check the People context table (Step 3c):

Match against Full Name (exact or first word)
Match against Nickname
Multiple detections for the same name strengthen signal

5c. Unmatched Vocative Recovery Batch (OPTIMIZED)

If any vocative names do not match People context:

Collect all unmatched names (e.g., "Kev", "Chew", "Alex", "Joe").

Issue recovery Globs in a single parallel batch:

Glob(People Folder/*Kev*.md)
Glob(People Folder/*Chew*.md)
Glob(People Folder/*Alex*.md)
Glob(People Folder/*Joe*.md)

For each glob hit:

Issue a Read in a single parallel batch (Phase 2 repeat)
Extract full_name, nickname, role_title, company/org
Add to People context table with Source="vocative_recovery"

For misses: flag the unmatched_vocative in the final evidence field for that speaker.

Step 6: Speaker Identification Analysis

Analyze the full transcript body (from Step 2), speaker stubs (frontmatter), and complete People context table (from Step 3c + 5c) using the rules below. Higher-priority rules override lower ones.

CRITICAL — duplicate assignments are expected and correct. Transcription engines frequently split one real person across multiple speaker tags (e.g., Speaker 1 and Speaker 3 are both the same person). Propose the best match for each tag independently. Do NOT enforce a one-to-one constraint between people and tags. If the evidence says Speaker 1 is "Jane Doe" and Speaker 3 is also "Jane Doe", propose "Jane Doe" for both.

Rule 1: Microphone Speaker — CERTAIN

Assign the Microphone user to the "Microphone" label. If no "Microphone" label exists, assign to the speaker with the most lines. Evidence: "microphone". Never overridden.

Rule 2: Calendar Attendees — CERTAIN or HIGH

Calendar + vocative match (see Rule 3) = CERTAIN
Calendar + one other transcript signal (style, topic expertise, vocative response) = HIGH
Calendar alone with no transcript evidence = do not assign. Invitees may be absent.

Rule 3: Vocative Scanning & Matching

Vocatives matched in Step 5 now resolve to full names via the People context table.

Vocative directly matches a single People note = CERTAIN or HIGH
Multiple independent vocatives resolve to same person = CERTAIN
Unmatched vocative (recovery failed) = flag but do not assign

Rule 4: Vocative-to-Response Mapping

The speaker who talks immediately after being called by name is likely that person. If Speaker A says "Tom, go ahead" and Speaker 3 speaks next, Speaker 3 is likely Tom.

Multiple vocative-responses for the same mapping = CERTAIN
Single vocative-response = HIGH
Conflicting mappings = LOW

Rule 5: First Name to Full Name Resolution + Context Matching (OPTIMIZED)

Resolve first names to full names:

Check calendar attendees — if exactly one has that first name, use them.
Check People context Full Name and Nickname columns.

First-name collision (multiple candidates share first name): try disambiguation in order:

Context match (NEW) — if transcript topic/discussion matches one candidate's Role/Context value, resolve at HIGH confidence with evidence "role/context match: [Context]"
- Example: Speaker discusses "VMware patching" and context table has "Gregory Kanis - VMware Administrator, Platform Team" → HIGH match
- Example: Speaker discusses "observability deployment" and context table has "Tanner Bragg - SRE, Platform Team" and "Tanner Smith - Finance" → HIGH match to Bragg
Calendar preference — if exactly one candidate is a calendar attendee, use them.
Neither resolves — flag as LOW with all candidates and their Context values listed.

Rule 6: Alias / Transcription Error Handling

For unresolved stubs, check Nicknames for phonetically similar matches to words spoken near that speaker. Confidence: LOW unless corroborated by Role/Context.

Confidence Levels (REFINED)

CERTAIN: microphone user, calendar + vocative, multiple vocative-responses, or multiple independent signals agreeing
HIGH: calendar + single signal, single vocative-response, vocative + context match, or calendar + role/context alignment
LOW: single weak signal, phonetic guess, ambiguous match, or unresolved collision
null: no evidence found

Build Proposed Mapping

For each speaker, record:

index — 0 for Microphone, N for Speaker N
original_name — stub label from transcript
proposed_name — resolved full name, or null (the same person may appear for multiple tags — this is correct)
confidence — CERTAIN, HIGH, LOW, or null if unresolved
evidence — brief signal description (include "role/context match: [matching field]" if used)

Do not downgrade confidence or skip a match because the same person was already proposed for a different tag. Evaluate each tag on its own evidence.

Step 7: Return Results

Return the mapping in the output format specified by the caller. Do not write changes to the transcript.

Caching & Performance Notes

Roster Cache Strategy:

Cache is eligible if meeting_subject exists AND is_recurring is true
Cached People context remains valid for 14 days
On cache hit: skip Step 3 Phase 2 reads entirely (major time savings)
On cache miss or new names: merge results into cache for future runs

Tool Call Batching Summary:

Step 2: 3 parallel calls (Read transcript frontmatter, wc, Glob roster cache) + transcript body chunks in parallel
Step 3: Read parent meeting note → Parse invitees → Read all invitee People notes in parallel (deterministic, no glob)
Step 5c: Recovery Glob batch (vocative recovery only) → Wait → Recovery Read batch (if needed)
Total batches: ~4-5 (vs ~10+ in non-optimized version; eliminates broad name-based glob phase)

Expected Impact:

With fresh People context build: 1-2 minute baseline
With roster cache hit: 30-40% faster (skips Phase 2 reads)
With context matching: 15-20% fewer unresolved speakers vs. non-optimized

13 KiB Executable File Raw Blame History