[FFmpeg-devel] [PATCH] lavc: support subtitles charset conversion.

Clément Bœsch ubitux at gmail.com
Mon Jan 7 10:22:41 CET 2013


On Sat, Jan 05, 2013 at 12:54:37PM +0100, Nicolas George wrote:
> Le quintidi 15 nivôse, an CCXXI, Clement Boesch a écrit :
> > I fail to see how it is more elegant; the codec properties sounds like the
> > best place to declare such generalities.
> 
> It is hard tu put elegance considerations into words. Looking at the various
> existing CODEC_CAP, I find they are usually more universal and/or more
> relevant to the API user, although I realize there are already exceptions.
> 
> >					   Using the context structure is
> > only a necessity if we need on the fly changes, which don't sound common
> > at all. And if we find such insanity, I'd suggest to fix that mess in the
> > decoder or the demuxer itself.
> 
> What about this, that I thought of this morning:
> 
> Sometimes, the recoding will be perforce be done by the demuxer. At other
> times, it will be done by lavc. In any case, the original encoding should be
> exposed to the API caller, so that this:
> 
> ffmpeg -ss 5 -i file.fmt [ -sub_charenc copy ] shifted_file.fmt
> 
> can work. And for convenience and compatibility reasons, it is probably be
> best if the original encoding is exported in the same field.
> 

I don't understand what you mean here: -sub_charenc is used to specify the
character encoding of the input, so using -sub_charenc copy would mean
something like "do nothing" basically.  We use the specified charenc for
encoding to UTF-8 because decoder must output only in UTF-8 (and even if
we replace the internal ASS representation with the StyledSubtitles, I'd
better deal with UTF-8 only).

> Thus my proposal with sub_charenc_mode and the first component that decides
> it can do the work sets it. That would work like that:
> 
> 1. If the demuxer knows the character encoding, it sets sub_charenc.

> 2. If the demuxer does the recoding, then it sets sub_charenc_mode to DONE,
>    otherwise it leaves it to its default 0.

Demuxers like TED and WebVTT would set it to DONE to avoid any recoding?

> 3. If mode is still 0, the codec init function sets it to either PRE, POST
>    or INTERNAL depending on its need.

OK

> 4. If mode is still 0 after codec init and a character encoding is set, lavc
>    reports an error.
> 

OK

[...]

It sounds a bit complicated for not much, but well. I'll try to come up
with something soon.

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130107/03a957b6/attachment.asc>


More information about the ffmpeg-devel mailing list