Preparing Transcripts

Transcribo Markdown format specification, speaker naming conventions, and common fixes.

The pipeline expects a specific Markdown format produced by Transcribo. This page covers the format specification and how to fix common issues.

Transcribo Markdown Format

Each speaker turn must follow this exact pattern:

[HH:MM:SS → HH:MM:SS] **Speaker Name:** Utterance text...

Breaking it down:

Component	Format	Example
Timestamps	`[HH:MM:SS → HH:MM:SS]`	`[00:12:34 → 00:13:45]`
Arrow	Unicode RIGHT ARROW `→` (U+2192)	Not ASCII `->`
Speaker	`Name:` (bold, with colon)	`Chris Moore:`
Text	Free-form text	`Welcome everyone to week one.`

The Unicode Arrow Requirement

The parser specifically looks for the Unicode right arrow character → (U+2192), not the ASCII hyphen-greater-than ->. Transcribo's refined output uses the correct Unicode character by default. If you've edited the transcript manually, verify you haven't accidentally replaced it.

To type the Unicode arrow:

macOS: Option + → (right arrow key)
Copy-paste: Copy this character: →

Example Transcript

Here is a correctly formatted transcript:

[00:00:15 → 00:01:02] **Chris Moore:** Welcome everyone to week one. Today we're going to explore visual thinking as a tool for understanding complex systems.

[00:01:05 → 00:01:45] **Student A:** I've been reading about concept maps. How do they relate to what we're doing here?

[00:01:48 → 00:02:30] **Chris Moore:** Great question. Concept maps are one technique, but we're going to go deeper. Visual thinking isn't just about drawing — it's about externalizing your mental models so you can examine them.

[00:02:35 → 00:03:10] **Student B:** I find that sketching while I read helps me retain information better. Is there research on that?

[00:03:15 → 00:04:00] **Chris Moore:** Absolutely. Dual coding theory from Paivio suggests that when you encode information both verbally and visually, recall improves significantly.

[00:04:05 → 00:04:40] **Student A:** Building on what Student B said, I think the act of choosing what to draw forces you to prioritize. You can't sketch everything.

[00:04:45 → 00:05:20] **Student C:** But doesn't that mean you might miss important details? If you're focused on drawing, you could overlook nuances in the text.

[00:05:25 → 00:06:00] **Chris Moore:** That's a real tension. The constraint of visual representation forces selection, which is both its strength and its limitation.

Speaker Naming Conventions

Use consistent names throughout the transcript:

Use the same name every time a person speaks. "Chris Moore" must always appear as "Chris Moore", not sometimes "Chris" or "Moore".
Student names can be real names or pseudonyms — the pipeline anonymizes them regardless. Transcribo typically uses the names from its speaker identification.
The instructor name should match what you pass to the --instructor argument (defaults to "Chris Moore").

Common Formatting Issues

Problem: ASCII arrows instead of Unicode

# Wrong
[00:12:34 -> 00:13:45] **Speaker:** Text...

# Correct
[00:12:34 → 00:13:45] **Speaker:** Text...

Fix: Find and replace -> with → in your text editor.

Problem: Missing bold markers on speaker name

# Wrong
[00:12:34 → 00:13:45] Speaker Name: Text...

# Correct
[00:12:34 → 00:13:45] **Speaker Name:** Text...

Fix: Ensure speaker names are wrapped in **double asterisks** and followed by a colon.

Problem: Inconsistent speaker names

# Wrong — same person, different names
[00:12:34 → 00:13:45] **Chris:** First utterance...
[00:14:00 → 00:14:30] **Chris Moore:** Second utterance...

# Correct — consistent naming
[00:12:34 → 00:13:45] **Chris Moore:** First utterance...
[00:14:00 → 00:14:30] **Chris Moore:** Second utterance...

Fix: Search for the shorter name variant and replace with the full, consistent name.

Problem: Missing timestamps

# Wrong
**Speaker Name:** Text without timestamps...

# Correct
[00:12:34 → 00:13:45] **Speaker Name:** Text with timestamps...

Fix: Lines without timestamps will be silently skipped by the parser. If important content is missing from the output, check for timestamp-less lines.

Problem: Extra whitespace or formatting in the transcript

Transcribo sometimes adds section headers, horizontal rules, or summary sections. These are harmless — the parser only matches lines that follow the timestamp pattern and ignores everything else.