Whisper Model Comparison: From tiny to large
Choosing the right Whisper model is crucial for balancing accuracy and speed. This guide compares all available models with real benchmark data.
Note: This app uses whisper.cpp, which requires significantly less VRAM than the original PyTorch Whisper (~10GB for large) due to GGML optimization.
Quick Comparison
| Model | Size | VRAM | Speed | Accuracy |
|---|---|---|---|---|
| tiny | 75 MB | ~1 GB | 32x | Good |
| base | 142 MB | ~1 GB | 16x | Better |
| small | 466 MB | ~2 GB | 6x | Great |
| medium | 1.5 GB | ~4 GB | 2x | Excellent |
| large-v3 | 3.0 GB | ~5 GB | 1x | Best |
| large-v3-turbo ⭐ | 809 MB | ~4 GB | 8x | Excellent |
Detailed Model Analysis
tiny
The fastest model, ideal for quick drafts or when you need results immediately.
Best for: Quick previews, low-resource systems, non-critical content
small
The sweet spot for most users. Great balance of speed and accuracy.
Best for: General use, YouTube videos, podcasts
large-v3-turbo ⭐ Recommended
Optimized for speed while maintaining excellent accuracy. Our recommended choice for most users.
Best for: Professional content, fast processing with high accuracy
large-v3
Maximum accuracy for critical content where every word matters.
Best for: Professional broadcasts, legal/medical content, multiple languages
Our Recommendations
For most users: Start with small or large-v3-turbo. They offer the best balance of speed and accuracy for everyday use.
By Use Case
- YouTube/Podcasts: small or large-v3-turbo
- Movies/TV Shows: medium or large-v3
- Quick Drafts: tiny or base
- Professional/Broadcast: large-v3
- Multiple Languages: medium or large-v3
By Hardware
- 4GB VRAM: small, medium, large-v3-turbo
- 6GB+ VRAM: Any model including large-v3
- CPU only: tiny or base recommended