whisper.cpp Migration Complete - WhisperSubTranslate Blog

In WhisperSubTranslate v1.3.0, we switched the subtitle extraction engine from faster-whisper to whisper.cpp. This article explains why we made this decision and what changes users can expect.

                TL;DR (Summary):
                Smaller package size (~100MB vs 2GB+) - faster downloads
Millisecond timestamp support - perfect for music videos and karaoke
Same Whisper models used - no difference in accuracy

            

Why Did We Change the Engine?

Previous Issues: faster-whisper Limitations

The previous faster-whisper was a Python-based implementation. While it provided excellent performance, there were some areas for improvement:

Distribution size: The bundled package was over 2GB including Python runtime and dependencies
Developer setup: Developers needed Python environment for building and testing
Timestamp precision: Only second-level timestamps, not ideal for music videos

Advantages of whisper.cpp

whisper.cpp is a pure C++ reimplementation of OpenAI Whisper:

Standalone executable: All features in a single exe file
Fully open source: MIT license, no restrictions
Active development: Continuous updates from ggml-org
Millisecond timestamps: Precise timing with -ml option

Performance Comparison

Comparison of actual test results:

Item	faster-whisper	whisper.cpp
Developer Environment	Python	Node.js
Timestamp Precision	Seconds	Milliseconds
Processing Speed (3 min video)	~21s	~18s
GPU Acceleration	CUDA	CUDA
Distribution Size	~2GB+	~100MB
Recognition Accuracy	Excellent	Excellent

What are Millisecond Timestamps?

Most existing subtitle extraction tools generate timestamps in seconds:

# Second-level timestamps (previous)
1
00:00:00,000 --> 00:00:08,000
Welcome to the channel and today we are going to talk about automation.
            

whisper.cpp with -ml 50 -sow options provides millisecond-level segmentation:

# Millisecond timestamps (whisper.cpp)
1
00:00:00,000 --> 00:00:04,640
Welcome to the channel and today we are going to

2
00:00:04,640 --> 00:00:07,520
talk about how to automate anything that is on

3
00:00:07,520 --> 00:00:08,000
web.
            

Where is This Useful?

Music video lyrics: Lyrics sync precisely with music
Karaoke subtitles: Create karaoke-style captions
Live subtitles: Real-time subtitle generation for broadcasts
Educational content: Precisely timed explanatory subtitles

Technical Changes

Detailed changes for developers:

Model Format Change

faster-whisper used CTranslate2 format (folder), while whisper.cpp uses GGML format (single .bin file):

# Previous (faster-whisper / CTranslate2)
_models/
├── small/
│   ├── config.json
│   ├── model.bin
│   ├── tokenizer.json
│   └── vocabulary.txt

# New (whisper.cpp / GGML)
_models/
├── ggml-small.bin      # Single file!
├── ggml-medium.bin
└── ggml-large-v3.bin
            

Command Changes

# Previous (faster-whisper)
python faster-whisper-xxl.exe --model small --language auto input.mp4

# New (whisper.cpp)
whisper-cli.exe -m ggml-small.bin -f input.wav -osrt -ml 50 -sow -l auto
            

Audio Preprocessing

whisper.cpp only supports WAV files, so we automatically convert with ffmpeg:

ffmpeg -y -i input.mp4 -ar 16000 -ac 1 -c:a pcm_s16le output.wav

How to Upgrade

Existing users can upgrade with these steps:

Download the latest version.
Backup your existing folder.
Extract the new version.
Run the app and re-download models (format changed).

                Note:
                 Existing settings (API keys, etc.) are preserved. You only need to download the models again.
            

Conclusion

With this engine migration, WhisperSubTranslate is now lighter, easier to install, and can generate more precise subtitles.

If you encounter any issues or have feedback, please let us know on GitHub Issues. We'll continue working to make better subtitle tools.

Thank you!