In WhisperSubTranslate v1.3.0, we switched the subtitle extraction engine from faster-whisper to whisper.cpp. This article explains why we made this decision and what changes users can expect.
- Smaller package size (~100MB vs 2GB+) - faster downloads
- Millisecond timestamp support - perfect for music videos and karaoke
- Same Whisper models used - no difference in accuracy
Why Did We Change the Engine?
Previous Issues: faster-whisper Limitations
The previous faster-whisper was a Python-based implementation. While it provided excellent performance, there were some areas for improvement:
- Distribution size: The bundled package was over 2GB including Python runtime and dependencies
- Developer setup: Developers needed Python environment for building and testing
- Timestamp precision: Only second-level timestamps, not ideal for music videos
Advantages of whisper.cpp
whisper.cpp is a pure C++ reimplementation of OpenAI Whisper:
- Standalone executable: All features in a single exe file
- Fully open source: MIT license, no restrictions
- Active development: Continuous updates from ggml-org
- Millisecond timestamps: Precise timing with
-mloption
Performance Comparison
Comparison of actual test results:
| Item | faster-whisper | whisper.cpp |
|---|---|---|
| Developer Environment | Python | Node.js |
| Timestamp Precision | Seconds | Milliseconds |
| Processing Speed (3 min video) | ~21s | ~18s |
| GPU Acceleration | CUDA | CUDA |
| Distribution Size | ~2GB+ | ~100MB |
| Recognition Accuracy | Excellent | Excellent |
What are Millisecond Timestamps?
Most existing subtitle extraction tools generate timestamps in seconds:
whisper.cpp with -ml 50 -sow options provides millisecond-level segmentation:
Where is This Useful?
- Music video lyrics: Lyrics sync precisely with music
- Karaoke subtitles: Create karaoke-style captions
- Live subtitles: Real-time subtitle generation for broadcasts
- Educational content: Precisely timed explanatory subtitles
Technical Changes
Detailed changes for developers:
Model Format Change
faster-whisper used CTranslate2 format (folder), while whisper.cpp uses GGML format (single .bin file):
Command Changes
Audio Preprocessing
whisper.cpp only supports WAV files, so we automatically convert with ffmpeg:
How to Upgrade
Existing users can upgrade with these steps:
- Download the latest version.
- Backup your existing folder.
- Extract the new version.
- Run the app and re-download models (format changed).
Conclusion
With this engine migration, WhisperSubTranslate is now lighter, easier to install, and can generate more precise subtitles.
If you encounter any issues or have feedback, please let us know on GitHub Issues. We'll continue working to make better subtitle tools.
Thank you!