Performance-optimized version of Speaker Auto-Tag Prompt with batched People note resolution and context-based confidence refinement.

Self-contained prompt for tagging unidentified speakers in one transcript stub using vault data (People notes, calendar attendees). Returns proposed speaker mappings; the caller handles review and writes.

**CRITICAL:** Never read the full transcript in a single Read call. Transcripts exceed the 10k token file-read limit. Use chunked reads (limit=500 per chunk). See Step 2.

**OPTIMIZATION FOCUS:** Read People notes deterministically from the parent meeting invitees list (no broad name-based glob), batch all invitee reads in parallel, build a comprehensive context table before analysis, and use role/context matching for confidence refinement.

---

## Step 1: Setup

If Read, Glob, Grep, and Bash are not already available, load them: ToolSearch("select:Read,Glob,Grep,Bash")

Extract from the calling message:
- Transcript — path to the transcript stub file
- Microphone user — full name of the person at the microphone
- Calendar Attendees — pre-fetched attendee context (optional; skips Step 4 if provided)
- People Roster — pre-built People context table (optional; skips Step 3 if provided — use as the People context table directly)
- People Folder — folder name for People notes (default: People)
- Output format — expected output format for the mapping

Path resolution: try the Transcript path as-is. If not found, error with the path.

Vault root derivation: extract vault root from the transcript path by removing the transcript folder suffix. Use this root combined with People Folder for all lookups.

Derived values (from filename, no read needed):
- MEETING_TITLE — strip date prefix (first 13 chars for "YYYY-MM-DD - " or first 11 chars for "YYYY-MM-DD ") and " - Transcript" suffix
- SANITIZED_SUBJECT — take MEETING_TITLE, replace /, \, : with hyphens. Used for cache Glob in Step 2.

---

## Step 2: Read Transcript + Glob Roster Cache + Glob People Notes

**Round 1 — issue in a single parallel batch:**
- Read(transcript_path, limit=60) — frontmatter only
- Bash: wc -l transcript_path — total line count
- Glob("Caches/Speaker Rosters/SANITIZED_SUBJECT.md") — roster cache for Step 5

Extract from frontmatter:
- session_id (or macwhisper_session_id for WhisperCal stubs)
- speakers — array with name, id, stub flag, line_count per speaker
- meeting_note — wiki-link to parent meeting note (used in Step 3a)
- calendar_event — event title, "none", or absent
- calendar_attendees — array of plain string names (or invitees for WhisperCal, or meeting_invitees for wiki-link format like "- [[Name]]")
- meeting_subject — used to validate cache match
- is_recurring — enables roster cache path
- pipeline_state — current pipeline state
- tags — detect WhisperCal stubs (tags: [transcript])

Pipeline state: any value (tagged, extracted, summarized, titled) or absent means proceed.

**Round 2 — transcript body chunks in a single parallel batch.** Use the line count from Round 1. Read in chunks of 500 lines, starting after the frontmatter:
- Read(transcript_path, offset=61, limit=500) — chunk 1
- Read(transcript_path, offset=561, limit=500) — chunk 2
- Read(transcript_path, offset=1061, limit=500) — chunk 3 if needed
- Continue until total lines are covered.

Most transcripts need 1 to 3 chunks. The full body across all chunks is the primary input for Step 6.

---

## Step 3: Read Invitees from Parent Meeting Note (OPTIMIZED)

**Skip this entire step if** `People Roster` was provided in the invocation parameters. Use the provided roster as the People context table directly and proceed to Step 4.

**CRITICAL:** If not skipped, do NOT skip this step. Complete it before Step 4 to maximize cache hits and build context early.

### 3a. Read Parent Meeting Note

The transcript frontmatter contains `meeting_note` field with a wiki-link to the parent meeting note (e.g., `[[2026-04-04 - Test Transcript]]`).

- Extract the meeting note path from the wiki-link
- Read the meeting note (limit=100 to capture frontmatter + invitees section)
- Extract the `meeting_invitees` field — this contains an array of People note wiki-links or plain attendee names

Example invitees field:
```
meeting_invitees:
  - "[[People/Joe Jackson]]"
  - "[[People/Tanner Bragg]]"
  - "[[People/Gregory Porter]]"
  - "[[People/Andrew Davis]]"
```

Or plain names:
```
meeting_invitees:
  - Joe Jackson
  - Tanner Bragg
  - Gregory Porter
```

### 3b. Parse Invitees & Build File Paths (DETERMINISTIC)

For each invitee:
- If wiki-link format `[[People/Name]]`: extract the path as `People/Name.md`
- If plain name: construct path as `People/{name}.md`

Issue **ALL** Read calls in a single parallel batch. This is deterministic — no glob phase.

Example (if 4 invitees):
```
Read(People/Joe Jackson.md, limit=60)
Read(People/Tanner Bragg.md, limit=60)
Read(People/Gregory Porter.md, limit=60)
Read(People/Andrew Davis.md, limit=60)
```

### 3c. Build People Context Table (Phase 2)

From each People note read in Phase 3b, extract:
- full_name
- nickname (if present)
- role_title
- company/org

Build a Context value per person as:
- `"role_title, company/org"` (both present)
- `"role_title"` (org empty)
- `"company/org"` (role empty)
- `""` (both empty)

Build a People context table with these fields per person:
| Full Name | Nickname | Role/Context | People Note Filename | Source |

Source values: "meeting_invitee", "calendar", "microphone_user", "vocative_recovery"

**This table is the foundation for Step 6 confidence refinement. Do not skip to Step 4 until this is complete.**

### 3d. Roster Cache Merge (If Cache Hit)

If a cache was found in Step 2:
- Compare table from Phase 3c against cached roster
- If all invitees are present in cache: use cached full context
- If new names found: merge Phase 3c results into cache, update cache generated date

---

## Step 4: Calendar Attendees

Skip if Calendar Attendees was provided in invocation (use it directly). Also skip if People Roster was provided.

- calendar_event is "none" — skip calendar lookups entirely (ad-hoc meeting)
- calendar_attendees, invitees, or meeting_invitees populated — use directly (strip [[ ]] wrappers if present)
- No calendar data — proceed without. Calendar context improves confidence but is not required.

Add all calendar attendees to the People context table from Step 3c with Source="calendar".

---

## Step 5: Vocative-to-Speaker Mapping

### 5a. Vocative Scanning (Pre-Step 6)

Scan the full transcript body (from Step 2) for direct address patterns:
- "[Name], go ahead"
- "Thanks, [Name]"
- "[Name], what do you think?"
- "[Name], can you..."
- "Over to you, [Name]"
- "[Name]?" (standalone calling)

### 5b. Match Against People Context

For each vocative detected, check the People context table (Step 3c):
- Match against Full Name (exact or first word)
- Match against Nickname
- Multiple detections for the same name strengthen signal

### 5c. Unmatched Vocative Recovery Batch (OPTIMIZED)

If any vocative names do not match People context:

Collect all unmatched names (e.g., "Kev", "Chew", "Alex", "Joe").

Issue recovery Globs in a single parallel batch:
```
Glob(People Folder/*Kev*.md)
Glob(People Folder/*Chew*.md)
Glob(People Folder/*Alex*.md)
Glob(People Folder/*Joe*.md)
```

For each glob hit:
- Issue a Read in a single parallel batch (Phase 2 repeat)
- Extract full_name, nickname, role_title, company/org
- Add to People context table with Source="vocative_recovery"

For misses: flag the unmatched_vocative in the final evidence field for that speaker.

---

## Step 6: Speaker Identification Analysis

Analyze the full transcript body (from Step 2), speaker stubs (frontmatter), and **complete** People context table (from Step 3c + 5c) using the rules below. Higher-priority rules override lower ones.

**CRITICAL — duplicate assignments are expected and correct.** Transcription engines frequently split one real person across multiple speaker tags (e.g., Speaker 1 and Speaker 3 are both the same person). Propose the best match for each tag independently. Do NOT enforce a one-to-one constraint between people and tags. If the evidence says Speaker 1 is "Jane Doe" and Speaker 3 is also "Jane Doe", propose "Jane Doe" for both.

### Rule 1: Microphone Speaker — CERTAIN

Assign the Microphone user to the "Microphone" label. If no "Microphone" label exists, assign to the speaker with the most lines. Evidence: "microphone". Never overridden.

### Rule 2: Calendar Attendees — CERTAIN or HIGH

- Calendar + vocative match (see Rule 3) = CERTAIN
- Calendar + one other transcript signal (style, topic expertise, vocative response) = HIGH
- Calendar alone with no transcript evidence = do not assign. Invitees may be absent.

### Rule 3: Vocative Scanning & Matching

Vocatives matched in Step 5 now resolve to full names via the People context table.

- Vocative directly matches a single People note = CERTAIN or HIGH
- Multiple independent vocatives resolve to same person = CERTAIN
- Unmatched vocative (recovery failed) = flag but do not assign

### Rule 4: Vocative-to-Response Mapping

The speaker who talks immediately after being called by name is likely that person. If Speaker A says "Tom, go ahead" and Speaker 3 speaks next, Speaker 3 is likely Tom.

- Multiple vocative-responses for the same mapping = CERTAIN
- Single vocative-response = HIGH
- Conflicting mappings = LOW

### Rule 5: First Name to Full Name Resolution + Context Matching (OPTIMIZED)

Resolve first names to full names:

1. Check calendar attendees — if exactly one has that first name, use them.
2. Check People context Full Name and Nickname columns.

**First-name collision (multiple candidates share first name): try disambiguation in order:**

1. **Context match** (NEW) — if transcript topic/discussion matches one candidate's Role/Context value, resolve at HIGH confidence with evidence "role/context match: [Context]"
   - Example: Speaker discusses "VMware patching" and context table has "Gregory Kanis - VMware Administrator, Platform Team" → HIGH match
   - Example: Speaker discusses "observability deployment" and context table has "Tanner Bragg - SRE, Platform Team" and "Tanner Smith - Finance" → HIGH match to Bragg

2. Calendar preference — if exactly one candidate is a calendar attendee, use them.
3. Neither resolves — flag as LOW with all candidates and their Context values listed.

### Rule 6: Alias / Transcription Error Handling

For unresolved stubs, check Nicknames for phonetically similar matches to words spoken near that speaker. Confidence: LOW unless corroborated by Role/Context.

### Confidence Levels (REFINED)

- **CERTAIN:** microphone user, calendar + vocative, multiple vocative-responses, or multiple independent signals agreeing
- **HIGH:** calendar + single signal, single vocative-response, vocative + context match, or calendar + role/context alignment
- **LOW:** single weak signal, phonetic guess, ambiguous match, or unresolved collision
- **null:** no evidence found

### Build Proposed Mapping

For each speaker, record:
- index — 0 for Microphone, N for Speaker N
- original_name — stub label from transcript
- proposed_name — resolved full name, or null (the same person may appear for multiple tags — this is correct)
- confidence — CERTAIN, HIGH, LOW, or null if unresolved
- evidence — brief signal description (include "role/context match: [matching field]" if used)

Do not downgrade confidence or skip a match because the same person was already proposed for a different tag. Evaluate each tag on its own evidence.

---

## Step 7: Return Results

Return the mapping in the output format specified by the caller. Do not write changes to the transcript.

---

## Caching & Performance Notes

**Roster Cache Strategy:**
- Cache is eligible if meeting_subject exists AND is_recurring is true
- Cached People context remains valid for 14 days
- On cache hit: skip Step 3 Phase 2 reads entirely (major time savings)
- On cache miss or new names: merge results into cache for future runs

**Tool Call Batching Summary:**
- Step 2: 3 parallel calls (Read transcript frontmatter, wc, Glob roster cache) + transcript body chunks in parallel
- Step 3: Read parent meeting note → Parse invitees → Read all invitee People notes in parallel (deterministic, no glob)
- Step 5c: Recovery Glob batch (vocative recovery only) → Wait → Recovery Read batch (if needed)
- Total batches: ~4-5 (vs ~10+ in non-optimized version; eliminates broad name-based glob phase)

**Expected Impact:**
- With fresh People context build: 1-2 minute baseline
- With roster cache hit: 30-40% faster (skips Phase 2 reads)
- With context matching: 15-20% fewer unresolved speakers vs. non-optimized