whisper.cpp Migration Complete - What's Changed?

In WhisperSubTranslate v1.3.0, we switched the subtitle extraction engine from faster-whisper to whisper.cpp. This article explains why we made this decision and what changes users can expect.

TL;DR (Summary):
  • Smaller package size (~100MB vs 2GB+) - faster downloads
  • Millisecond timestamp support - perfect for music videos and karaoke
  • Same Whisper models used - no difference in accuracy

Why Did We Change the Engine?

Previous Issues: faster-whisper Limitations

The previous faster-whisper was a Python-based implementation. While it provided excellent performance, there were some areas for improvement:

Advantages of whisper.cpp

whisper.cpp is a pure C++ reimplementation of OpenAI Whisper:

Performance Comparison

Comparison of actual test results:

Item faster-whisper whisper.cpp
Developer Environment Python Node.js
Timestamp Precision Seconds Milliseconds
Processing Speed (3 min video) ~21s ~18s
GPU Acceleration CUDA CUDA
Distribution Size ~2GB+ ~100MB
Recognition Accuracy Excellent Excellent

What are Millisecond Timestamps?

Most existing subtitle extraction tools generate timestamps in seconds:

# Second-level timestamps (previous) 1 00:00:00,000 --> 00:00:08,000 Welcome to the channel and today we are going to talk about automation.

whisper.cpp with -ml 50 -sow options provides millisecond-level segmentation:

# Millisecond timestamps (whisper.cpp) 1 00:00:00,000 --> 00:00:04,640 Welcome to the channel and today we are going to 2 00:00:04,640 --> 00:00:07,520 talk about how to automate anything that is on 3 00:00:07,520 --> 00:00:08,000 web.

Where is This Useful?

Technical Changes

Detailed changes for developers:

Model Format Change

faster-whisper used CTranslate2 format (folder), while whisper.cpp uses GGML format (single .bin file):

# Previous (faster-whisper / CTranslate2) _models/ ├── small/ │ ├── config.json │ ├── model.bin │ ├── tokenizer.json │ └── vocabulary.txt # New (whisper.cpp / GGML) _models/ ├── ggml-small.bin # Single file! ├── ggml-medium.bin └── ggml-large-v3.bin

Command Changes

# Previous (faster-whisper) python faster-whisper-xxl.exe --model small --language auto input.mp4 # New (whisper.cpp) whisper-cli.exe -m ggml-small.bin -f input.wav -osrt -ml 50 -sow -l auto

Audio Preprocessing

whisper.cpp only supports WAV files, so we automatically convert with ffmpeg:

ffmpeg -y -i input.mp4 -ar 16000 -ac 1 -c:a pcm_s16le output.wav

How to Upgrade

Existing users can upgrade with these steps:

  1. Download the latest version.
  2. Backup your existing folder.
  3. Extract the new version.
  4. Run the app and re-download models (format changed).
Note: Existing settings (API keys, etc.) are preserved. You only need to download the models again.

Conclusion

With this engine migration, WhisperSubTranslate is now lighter, easier to install, and can generate more precise subtitles.

If you encounter any issues or have feedback, please let us know on GitHub Issues. We'll continue working to make better subtitle tools.

Thank you!