Preparing Transcripts
Transcribo Markdown format specification, speaker naming conventions, and common fixes.
The pipeline expects a specific Markdown format produced by Transcribo. This page covers the format specification and how to fix common issues.
Transcribo Markdown Format
Each speaker turn must follow this exact pattern:
[HH:MM:SS → HH:MM:SS] **Speaker Name:** Utterance text...
Breaking it down:
| Component | Format | Example |
|---|---|---|
| Timestamps | [HH:MM:SS → HH:MM:SS] | [00:12:34 → 00:13:45] |
| Arrow | Unicode RIGHT ARROW → (U+2192) | Not ASCII -> |
| Speaker | **Name:** (bold, with colon) | **Chris Moore:** |
| Text | Free-form text | Welcome everyone to week one. |
The Unicode Arrow Requirement
The parser specifically looks for the Unicode right arrow character → (U+2192), not the ASCII hyphen-greater-than ->. Transcribo's refined output uses the correct Unicode character by default. If you've edited the transcript manually, verify you haven't accidentally replaced it.
To type the Unicode arrow:
- macOS: Option + → (right arrow key)
- Copy-paste: Copy this character: →
Example Transcript
Here is a correctly formatted transcript:
[00:00:15 → 00:01:02] **Chris Moore:** Welcome everyone to week one. Today we're going to explore visual thinking as a tool for understanding complex systems. [00:01:05 → 00:01:45] **Student A:** I've been reading about concept maps. How do they relate to what we're doing here? [00:01:48 → 00:02:30] **Chris Moore:** Great question. Concept maps are one technique, but we're going to go deeper. Visual thinking isn't just about drawing — it's about externalizing your mental models so you can examine them. [00:02:35 → 00:03:10] **Student B:** I find that sketching while I read helps me retain information better. Is there research on that? [00:03:15 → 00:04:00] **Chris Moore:** Absolutely. Dual coding theory from Paivio suggests that when you encode information both verbally and visually, recall improves significantly. [00:04:05 → 00:04:40] **Student A:** Building on what Student B said, I think the act of choosing what to draw forces you to prioritize. You can't sketch everything. [00:04:45 → 00:05:20] **Student C:** But doesn't that mean you might miss important details? If you're focused on drawing, you could overlook nuances in the text. [00:05:25 → 00:06:00] **Chris Moore:** That's a real tension. The constraint of visual representation forces selection, which is both its strength and its limitation.
Speaker Naming Conventions
Use consistent names throughout the transcript:
- Use the same name every time a person speaks. "Chris Moore" must always appear as "Chris Moore", not sometimes "Chris" or "Moore".
- Student names can be real names or pseudonyms — the pipeline anonymizes them regardless. Transcribo typically uses the names from its speaker identification.
- The instructor name should match what you pass to the
--instructorargument (defaults to "Chris Moore").
Common Formatting Issues
Problem: ASCII arrows instead of Unicode
# Wrong [00:12:34 -> 00:13:45] **Speaker:** Text... # Correct [00:12:34 → 00:13:45] **Speaker:** Text...
Fix: Find and replace -> with → in your text editor.
Problem: Missing bold markers on speaker name
# Wrong [00:12:34 → 00:13:45] Speaker Name: Text... # Correct [00:12:34 → 00:13:45] **Speaker Name:** Text...
Fix: Ensure speaker names are wrapped in **double asterisks** and followed by a colon.
Problem: Inconsistent speaker names
# Wrong — same person, different names [00:12:34 → 00:13:45] **Chris:** First utterance... [00:14:00 → 00:14:30] **Chris Moore:** Second utterance... # Correct — consistent naming [00:12:34 → 00:13:45] **Chris Moore:** First utterance... [00:14:00 → 00:14:30] **Chris Moore:** Second utterance...
Fix: Search for the shorter name variant and replace with the full, consistent name.
Problem: Missing timestamps
# Wrong **Speaker Name:** Text without timestamps... # Correct [00:12:34 → 00:13:45] **Speaker Name:** Text with timestamps...
Fix: Lines without timestamps will be silently skipped by the parser. If important content is missing from the output, check for timestamp-less lines.
Problem: Extra whitespace or formatting in the transcript
Transcribo sometimes adds section headers, horizontal rules, or summary sections. These are harmless — the parser only matches lines that follow the timestamp pattern and ignores everything else.