[FFmpeg-devel] [PATCH] avformat: Implement subtitle charenc guessing

Nicolas George george at nsup.org
Sun Dec 14 17:06:02 CET 2014

Le tridi 23 frimaire, an CCXXIII, Rodger Combs a écrit :
> I couldn't see a sensible way to do this in lavc, since the detector
> libraries generally require more than one packet to work effectively.
> Looking at that doxy again, I can see how the detection could be done in
> lavf and the conversion in lavc, but I don't really see an advantage there
> other than fewer API changes.

There is no benefit in doing the conversion in lavc for text files, but text
files processed by lavf are not the only source of subtitles. The conversion
in lavc must stay there for those cases, and the conversion in lavf must
work gracefully with it.

> So, by default it'd just handle encoding, and then additional
> normalization features could be enabled by the consumer? Sounds useful
> indeed.

Something like that. You can have a look at the first draft for the API


Splitting lines and normalizing LF was enabled by a flag.

The API itself will probably need to be changed to allow pluggable detection
modules without using more global state.

> I like this model in general, but it brings up a few questions that I kind
> of dodged in my patch. For instance, how should lavu determine which
> module's output to prefer if there are conflicting charenc guesses? How
> can the consumer choose between the given guesses?

> In my patch, preference is very simplistic and the order is hardcoded. In
> a more modular system, it'd have to be a bit more complex; I can imagine
> some form of scoring system, or even another type of module that ranks
> possible guesses, but that could get very complex very fast. Any ideas for
> this?

In this case, I believe that keeping simple at API level is the best
approach: the detection state is held in a structure, each detection module
is called in turn with the same structure and update it with its result.

Then, it is only a matter of specifying what an acceptable "update" is: only
change a value if the new value is more sure than the previous one.

As for the exact fields that must be present in the structure, that depends
on the exact useful information each relevant libraries can return.

> In my patch, the consumer can override the choice of encoding by making
> changes to the AVFormatContext between "header reading" and retrieving the
> packet; it seems like the best way to do so in your system would be to
> pass a callback.

Can you explain in what situation this kind of overriding would be

> On a bit of a side-note: my system is designed to make every possible
> effort to return a recoded packet, with multiple layers of fallback
> behavior in case the first guess turns out to be incorrect or the source
> file is outright invalid. I wouldn't expect that to be significantly more
> difficult with your design, but I wonder what your opinions on the setup
> are?

For this, I believe this is on a per-user basis. Some users want that
everything works automagically, some users want to be notified even if the
smallest detail goes unexpected. In the end, it should probably come to an

ffmpeg -text_encoding certainty_threshold=80:allow_substitute=invalid

for example, to accept a guess only when it has at least 80% certainty and
allow to replace invalid input sequences by a mask character.

> So, the text-file-read API would buffer the entire input file and perform
> charenc detection/conversion and/or other normalization, then FFTextReader
> would read from the normalized buffer?

Something like that. Since FFTextReader is internal, there is room to choose
the exact implementation.


  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20141214/2b5a1768/attachment.asc>

More information about the ffmpeg-devel mailing list