Pipeline Flow
🔄 Complete Pipeline Concurrent Architecture
The following diagram shows the complete concurrent architecture of subtitle generation, including parallel async tasks, Semaphore control, and inter-task dependencies:
🔒 Dual Semaphore Concurrency Control
Configuration:
| Semaphore | Purpose | Default Concurrency | Config Key |
|---|---|---|---|
transcriptionSemaphore | Control Whisper API | Local: 1, Cloud: 5 | whisperConcurrency |
refinementSemaphore | Control Gemini Flash | 5 | concurrencyFlash |
alignmentSemaphore | Control Alignment | 2 | concurrencyAlignment |
| (Glossary internal) | Control Gemini Pro | 2 | concurrencyPro |
📊 6-Stage Chunk Pipeline
🏗️ Pipeline Step Architecture (v2.13)
v2.13 introduced class-based step architecture, modularizing Chunk processing logic:
Step Descriptions:
| Step | File | Input | Output | Purpose |
|---|---|---|---|---|
TranscriptionStep | TranscriptionStep.ts | AudioChunk | SubtitleItem[] | Whisper speech-to-text |
WaitForDepsStep | WaitForDepsStep.ts | - | Glossary + Speakers | Wait for extraction completion |
RefinementStep | RefinementStep.ts | SubtitleItem[] | SubtitleItem[] | Timeline correction, term app |
AlignmentStep | AlignmentStep.ts | SubtitleItem[] | SubtitleItem[] | CTC forced alignment |
TranslationStep | TranslationStep.ts | SubtitleItem[] | SubtitleItem[] | AI translation |
ProofreadStep | ProofreadStep.ts | SubtitleItem[] | SubtitleItem[] | Batch proofreading (optional) |
⚖️ Batch Operations Comparison (v2.13)
| Feature | Proofread | Regenerate |
|---|---|---|
| File | batch/proofread.ts | batch/regenerate.ts |
| Purpose | Polish existing translations | Completely reprocess segments |
| Flow | Gemini Pro only | Full pipeline (Transcribe→Refine→Align→Translate) |
| Input | Existing SubtitleItem[] | Original audio + time range |
| Preserves | Original timestamps | Nothing (regenerates all) |
| Use Case | Improve translation quality | Fix transcription errors, re-segment |
| Model | Gemini 3 Pro | Whisper + Gemini Flash |
🔄 Data Integrity & Reconciliation
The system uses strict data reconciliation strategy (src/services/subtitle/reconciler.ts) to ensure metadata consistency across pipeline stages.
Reconciler Logic
reconcile(prev, curr) function acts as the "data hub" connecting pipeline stages:
- Semantic Metadata (always inherited):
speaker(Speaker ID/Name)comment(User notes)- Logic: Inherited from
prevsegment with highest overlap. Even split segments inherit parent's speaker.
- Internal State (conditionally inherited):
alignmentScore(CTC confidence)lowConfidence(Low confidence flag)- Logic: Strictly inherited only when 1:1 mapping detected. Reset if segments are split/merged.
Alignment Strategy (CTC)
- Engine:
ctcAligner.tscalls externalalign.exe(MMS-300m model) - Function: Updates
startTimeandendTimebased on audio alignment, but never splits or merges segments - Metadata: Adds
alignmentScore. Scores below threshold triggerlowConfidenceflag