[MEncoder-users] Convert VOB subtitles to SRT format

Peter Cordes peter at cordes.ca
Thu Aug 2 02:30:10 CEST 2007


On Mon, Jul 30, 2007 at 05:48:22PM +0200, Grozdan Nikolov wrote:
> Hi,
> 
> I know this is not mencoder related but I'm only subscribed to this list.... I 
> have a subtitle here (subtitle.sub and subtitle.idx files - 1 language) and I 
> want to convert it to the SRT format. How do I do it? I read at 
> http://www.mplayerhq.hu/DOCS/HTML/en/subosd.html that it is possible but 
> neither the man page nor the link above is very clear on how to do it. Can 
> someone give me a command line example?

 I just did this the other day.  avidemux has a nice tool for OCRing a
vobsub into a SubRip (.srt) sub.  You give it the .idx, and choose which
language you want.  It works by matching exact glyphs, so it always gets
I vs. l wrong because they always look the same in all the vobsubs I've
seen.  You have to type in the character for everything it hasn't seen yet,
but once you've hit most of the alphabet it doesn't take long.  If you see
parts of letters, change the threshold it uses to draw contours of things.
It's better to have to tell it that it's found tt or ff than to deal with
the left half of an m, or something.

 I had mixed results trying to type in non-ascii characters with accents
using xvkbd (an on-screen keyboard which can switch to different layouts,
spanish, french, german, UK (for pound sign), etc.  For English, it worked
perfectly.

 When you're done, save your glyphs to a file so you can load them next
time.  avidemux's file selector remembers what directory you were in, so
the easiest thing to do is make a symlink to your glyph file in your home
dir from wheverer you're working on the movie.  Then you don't have to go
far in the file selector dialog, and then back.

 To spell check the srt, I used subtitleeditor.  It has a spell checker.  
It can even play the segment of the movie that goes with the selected
subtitle.  I'd like to find a spell checker that realized it was working on
text OCRed from a sans-serif font, and would try switching l and I as a
first option in the replace window.  Unfortunately, subtitleeditor is not
smart like that.  (If those letters look the same to you in this email, go
find a font where you can tell them apart.  And o, O, and 0, while you're at
it.)


 So, avidemux for vobsub -> srt, and subtitleeditor for srt spellcheck.


 I was making srt subs just for the forced vobsubs in the English subtitle
track.  I don't have rar-2.80 on my amd64, and I didn't want to leave the
vobsubs uncompressed.  I was able to merge them into an mkv with the movie,
but mplayer (and vlc and xine) seem to ignore the .idx which goes into the
codecprivate.  I editted the .idx to say forced subs: ON before I mkvmerged
them, and looking at the mkv with less I could see that line in there.
Unfortunately, mplayer doesn't seem to actually only play forced subs.
Also, it won't default to playing the vobsub track that has the default flag
in an mkv, maybe because one sub track always ends up with the default flag,
and mplayer is working around that annoyance.  (I saw a post from someone
who muxed an empty subtitle track as the default for that reason, presumably
with a different player.)

 There might be a bug in mplayer re: forced subs only being set in the
".idx" in an mkv, since pressing F switches to forced subs: disabled, and
pressing it again switches to enabled (and then it really is enabled).

 Anyway, so I decided to just make srt subs for the forced subs.  The
problem became finding which subs were forced.  spuunmux (in Ubuntu's
dvdauthor package) can read a vobsub .sub and write an XML file and a
directory of png images of the bitmaps.  If there was an option to just
write the xml, it would run much faster, but it doesn't take too long as it
is.

I had multiple movies to look for forced subs in, so I did:
mkdir /tmp/subs
for movie in *;
  do pushd "$movie" && [ -e *cd1*.sub ] || unrar x *.rar
  for i in 1 2;do
    spuunmux -o /tmp/subs/"$movie"."$i" *cd$i*.sub
  done
  popd
done
grep -l force /tmp/subs/*.xml

Then to find which subs are the forced ones
grep force /tmp/subs/movie.xml
 (spuunmux seems to generate bogus timestamps.  They're maybe off by a
factor of some frame rate ratio?  No idea why.)

 Then you can mark the forced ones in subtitleeditor, and then delete all
the non-forced ones.  (You can add a "name" column in subtitleeditor, and
use it to put a mark before you start deleting, so the numbers still match
up (after you delete one to correct for 0-based vs. 1-based.))

 You can always use an image viewer to see the text for a given subtitle
number.


 Anyway, hope this helps.

-- 
#define X(x,y) x##y
Peter Cordes ;  e-mail: X(peter at cor , des.ca)

"The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces!" -- Plautus, 200 BC
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 351 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/mencoder-users/attachments/20070801/f0c3d7ec/attachment.pgp>


More information about the MEncoder-users mailing list