Preparing Transcripts

Transcribo Markdown format specification, speaker naming conventions, and common fixes.

The pipeline expects a specific Markdown format produced by Transcribo. This page covers the format specification and how to fix common issues.

Transcribo Markdown Format

Each speaker turn must follow this exact pattern:

[HH:MM:SS → HH:MM:SS] **Speaker Name:** Utterance text...

Breaking it down:

ComponentFormatExample
Timestamps[HH:MM:SS → HH:MM:SS][00:12:34 → 00:13:45]
ArrowUnicode RIGHT ARROW (U+2192)Not ASCII ->
Speaker**Name:** (bold, with colon)**Chris Moore:**
TextFree-form textWelcome everyone to week one.

The Unicode Arrow Requirement

The parser specifically looks for the Unicode right arrow character (U+2192), not the ASCII hyphen-greater-than ->. Transcribo's refined output uses the correct Unicode character by default. If you've edited the transcript manually, verify you haven't accidentally replaced it.

To type the Unicode arrow:

  • macOS: Option + → (right arrow key)
  • Copy-paste: Copy this character: →

Example Transcript

Here is a correctly formatted transcript:

[00:00:15 → 00:01:02] **Chris Moore:** Welcome everyone to week one. Today we're going to explore visual thinking as a tool for understanding complex systems.

[00:01:05 → 00:01:45] **Student A:** I've been reading about concept maps. How do they relate to what we're doing here?

[00:01:48 → 00:02:30] **Chris Moore:** Great question. Concept maps are one technique, but we're going to go deeper. Visual thinking isn't just about drawing — it's about externalizing your mental models so you can examine them.

[00:02:35 → 00:03:10] **Student B:** I find that sketching while I read helps me retain information better. Is there research on that?

[00:03:15 → 00:04:00] **Chris Moore:** Absolutely. Dual coding theory from Paivio suggests that when you encode information both verbally and visually, recall improves significantly.

[00:04:05 → 00:04:40] **Student A:** Building on what Student B said, I think the act of choosing what to draw forces you to prioritize. You can't sketch everything.

[00:04:45 → 00:05:20] **Student C:** But doesn't that mean you might miss important details? If you're focused on drawing, you could overlook nuances in the text.

[00:05:25 → 00:06:00] **Chris Moore:** That's a real tension. The constraint of visual representation forces selection, which is both its strength and its limitation.

Speaker Naming Conventions

Use consistent names throughout the transcript:

  • Use the same name every time a person speaks. "Chris Moore" must always appear as "Chris Moore", not sometimes "Chris" or "Moore".
  • Student names can be real names or pseudonyms — the pipeline anonymizes them regardless. Transcribo typically uses the names from its speaker identification.
  • The instructor name should match what you pass to the --instructor argument (defaults to "Chris Moore").

Common Formatting Issues

Problem: ASCII arrows instead of Unicode

# Wrong
[00:12:34 -> 00:13:45] **Speaker:** Text...

# Correct
[00:12:34 → 00:13:45] **Speaker:** Text...

Fix: Find and replace -> with in your text editor.

Problem: Missing bold markers on speaker name

# Wrong
[00:12:34 → 00:13:45] Speaker Name: Text...

# Correct
[00:12:34 → 00:13:45] **Speaker Name:** Text...

Fix: Ensure speaker names are wrapped in **double asterisks** and followed by a colon.

Problem: Inconsistent speaker names

# Wrong — same person, different names
[00:12:34 → 00:13:45] **Chris:** First utterance...
[00:14:00 → 00:14:30] **Chris Moore:** Second utterance...

# Correct — consistent naming
[00:12:34 → 00:13:45] **Chris Moore:** First utterance...
[00:14:00 → 00:14:30] **Chris Moore:** Second utterance...

Fix: Search for the shorter name variant and replace with the full, consistent name.

Problem: Missing timestamps

# Wrong
**Speaker Name:** Text without timestamps...

# Correct
[00:12:34 → 00:13:45] **Speaker Name:** Text with timestamps...

Fix: Lines without timestamps will be silently skipped by the parser. If important content is missing from the output, check for timestamp-less lines.

Problem: Extra whitespace or formatting in the transcript

Transcribo sometimes adds section headers, horizontal rules, or summary sections. These are harmless — the parser only matches lines that follow the timestamp pattern and ignores everything else.