[WIP] Add stable timestamps mode with VAD-aware timing by thewh1teagle · Pull Request #3675 · ggml-org/whisper.cpp

thewh1teagle · 2026-02-23T01:12:35Z

Add a new stable timestamps mode that makes word/segment timing less likely to drift into silence.

What it includes:

VAD-based silence map and post-hoc timestamp snapping
DTW alignment improvements (gap padding + dynamic head selection)
Silence-constrained timestamp decoding via logits filtering
CLI/API wiring for --stable-timestamps
Synthetic TTS verification scripts to compare baseline vs stable outputs and timing quality

Early 5-minute synthetic verification results (large-v3-turbo): start-in-silence dropped 43.3% -> 10.9% (213 -> 55), silence overlaps dropped 240 -> 88, WER improved 23.8% -> 2.6%, CER
improved 21.6% -> 2.2%, and token count stayed close (492 -> 506). Runtime in this run was 10.6s baseline vs 30.2s stable (needs optimization).

…stamps Replace concatenate-decode-remap pipeline with per-segment VAD decoding, matching how stable-ts/faster-whisper works. Each VAD speech segment is decoded independently and timestamps are offset by the segment's original start time — no mapping table or interpolation needed. Results on 5-min synthetic audio (46 utterances, 7x 20s pauses): pct_words_overlap: 0.89% (vs 5.7% stable-ts, 22.6% previous v2) n_words_overlap: 5 (vs 22 stable-ts, 144 previous v2) Wall time: 22.8s (vs 43.2s stable-ts — 1.9x faster via Metal) Code removed: - whisper_vad() concatenation + mapping table building - vad_time_mapping struct, vad_mapping_table, has_vad_segments from state - map_processed_to_original_time() in whisper.cpp - whisper_stable_map_processed_to_original() in whisper-stable.cpp - mapping params from whisper_stable_snap_segments() Code added: - whisper_full_vad_segments(): ~70-line per-segment decode loop - whisper_full_parallel() with VAD delegates to whisper_full() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Plans, notes, test outputs, and benchmark scripts are internal development artifacts — not relevant to the PR review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thewh1teagle force-pushed the feature/stable-timestamps branch 3 times, most recently from cdf2ea4 to b967a8f Compare February 23, 2026 04:07

thewh1teagle mentioned this pull request Mar 5, 2026

Stable timestamps thewh1teagle/vibe#998

Closed

thewh1teagle and others added 2 commits March 6, 2026 02:15

Add stable timestamps module and verification scripts

fcbd188

thewh1teagle force-pushed the feature/stable-timestamps branch from ab381e3 to 4f5d796 Compare March 6, 2026 00:15

chore: remove validation artifacts from PR

49d736e

Plans, notes, test outputs, and benchmark scripts are internal development artifacts — not relevant to the PR review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add stable timestamps mode with VAD-aware timing#3675

[WIP] Add stable timestamps mode with VAD-aware timing#3675
thewh1teagle wants to merge 3 commits intoggml-org:masterfrom
thewh1teagle:feature/stable-timestamps

thewh1teagle commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thewh1teagle commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant