[FFmpeg-devel] [PATCH] lavc: make invalid UTF-8 in subtitle output a non-fatal error

Nicolas George nicolas.george at normalesup.org
Thu Jun 27 19:35:08 CEST 2013

L'octidi 8 messidor, an CCXXI, wm4 a écrit :
> Some people actually use libav* for playback. There is no transcoding
> in this case.

If we are not talking about the ffmpeg command-line tool, there is no
problem: any program is free to handle errors however it sees fit, including
ignoring them.

> Other than the fact that the program using libavcodec will not
> necessarily have a -sub_charenc option (but perhaps an equivalent)...
> yes, having the correct codepage would be ideal, but you are a bit too
> optimistic about the amount of broken messes out there. Also, do you
> really expect users to open subtitles with a text editor first to
> figure out the codepage?

That depends on the user, but you are mostly burning a straw man here. I
expect that most usable applications (I am not talking about quick-and-dirty
hacks that can only be useful to their authors) implement encoding
autodetection heuristics.

(At some point I indent to add helpers for that in lavf, but I have not yet
had time to continue working on that.)

>			   The situation currently is that if there
> happens to be a subtitle event with, say, a broken umlaut, the user
> can't see the line (will he even see the error messages?). And if he
> does notice that something is wrong, has to stop playback, guess the
> correct codepage, restart, and repeat until ffmpeg is happy. Even if he
> knows that most of the text would be readable, and he doesn't consider
> it worth the effort to fix it, ffmpeg will simply stay in the way.
> Auto-detection can return incorrect results too. Even worse,
> auto-detection as well as conversion with iconv could succeed without
> indication that something is wrong, even if they produce garbage. This
> actually does happen in some cases. And then you have broken files
> again. (Not technically broken as they're valid UTF-8, but useless.)

All that you say here is completely true, but ignoring the error will not
make it any better. Quite the contrary in fact.

> My point is that displaying broken data is slightly better than
> displaying nothing at all. On the other hand, I don't really get your
> point. If everything is completely broken, it will be immediately
> obvious to the user. Why would you drop the subtitle events at all?
> Character salad is an obvious hint that it's a codepage issue, while
> missing subs will make it harder for the user to figure out what
> exactly went wrong.

Your arguments are completely heads over tail here, I can not understand
your logic at all.

The role of a library is not to decide on policy, that is up to the
application, relying on information returned by the library. Your proposal
removes the error code signaling an encoding problem to the application,
that is just plain wrong.


  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130627/17cb778c/attachment.asc>

More information about the ffmpeg-devel mailing list