Back to Blog
Tutorial

Creating Music Video Lyrics Subtitles

July 2024 • 5 min read

Music videos require precise subtitle timing - even a few hundred milliseconds off can make lyrics feel out of sync. This tutorial shows you how to create perfectly timed lyrics subtitles.

Why Music Videos Are Different

Unlike speech, music has:

Step-by-Step Process

1

Choose the Right Model

For music, use large-v3 or medium. These models handle singing better than smaller ones. The accuracy difference is significant for musical content.

2

Specify the Language

Don't use auto-detect for music. Explicitly select the song's language. This dramatically improves accuracy for lyrics.

3

Extract with Millisecond Timing

WhisperSubTranslate with whisper.cpp provides millisecond-precision timestamps. This is crucial for music sync.

4

Review and Adjust

AI transcription of singing isn't perfect. Review the output and manually fix any misheard lyrics. The timing is usually accurate even when words are wrong.

Millisecond Timestamps Explained

WhisperSubTranslate uses whisper.cpp which provides precise timing:

1 00:00:05,240 --> 00:00:07,890 Never gonna give you up 2 00:00:07,890 --> 00:00:10,450 Never gonna let you down

Notice the millisecond precision (,240, ,890, ,450). This level of accuracy is essential for music sync.

Tips for Better Results

Use clean audio: If possible, use the official audio track rather than extracting from a video with background noise.

Handling Instrumental Sections

Whisper might try to transcribe instrumental sections. You can:

Background Vocals

For backing vocals or harmonies, consider:

Note: AI may struggle with heavily auto-tuned vocals, screaming, or multiple overlapping voices. Manual correction will be needed.

Editing Tips

Adjusting Timing

If subtitles are consistently early or late, you can shift all timestamps. Most video editors and subtitle tools have a "shift" feature.

Line Breaks

For better readability, break long lines at natural pauses:

1 00:00:12,100 --> 00:00:16,500 I've been searching for something that I can't quite find

Karaoke Style (Advanced)

For karaoke-style subtitles where words highlight as they're sung, you'll need specialized software. WhisperSubTranslate provides the base timing which you can then enhance.

Recommended Workflow

  1. Extract audio from video if needed
  2. Process with large-v3 model, specified language
  3. Export SRT file
  4. Open in subtitle editor (Aegisub, Subtitle Edit)
  5. Compare with official lyrics if available
  6. Fine-tune timing while watching the video
  7. Export final version

Pro Tip: Search for official lyrics online and use them as reference. You can fix the words while keeping WhisperSubTranslate's accurate timing.