[FFmpeg-devel] [PATCH] lavc: support subtitles charset conversion.
Clément Bœsch
ubitux at gmail.com
Fri Jan 4 11:35:51 CET 2013
On Fri, Jan 04, 2013 at 11:20:12AM +0100, Nicolas George wrote:
> Le quintidi 15 nivôse, an CCXXI, Clement Boesch a écrit :
> > I was wondering what in my patch was preventing any code evolution, so my
> > question was about any interface with the user.
> >
> > Now that the character re-encoding is done before decoding, do you see any
> > design problem with it?
>
> Well, you answer your own question:
>
> > Note about the FIXME in the patch: we should add some kind of flags to the
> > codec because the recoding will obviously have unwanted behaviour with
> > bitmap sub packets.
>
> I had not thought of that. You are perfectly right.
>
I think Wolfram is working on something like this. But I may send a patch
anyway soon (sorry).
> If we go ahead with the mode field I suggested, then the solution would
> probably be to let the default be "do nothing", or even maybe "croak if the
> encoding is set"; adding "if (!mode) mode = PRE" in all text decoders init
> is not much work, and no more than adding flags on the codecs.
>
> OTOH, knowing what are text subtitles codecs and what are bitmap subtitles
> codecs would be useful not only there, so a codec flags would probably be
> nice too.
>
Adding a "support charset recoding" codec description property sounds like
the most simple and reliable thing to do.
> More generally, this issue is an example of why this should not be rushed:
> this is a tricky issue with a lot of pitfalls. I believe each time a new
> pitfall is found and avoided, it should reset the forethought time.
>
> Now, for the technical comments:
>
[...]
> > volatile int ff_avcodec_locked;
> > static int volatile entangled_thread_counter = 0;
> > @@ -1058,6 +1061,12 @@ int attribute_align_arg avcodec_open2(AVCodecContext *avctx, const AVCodec *code
> > ret = AVERROR(EINVAL);
> > goto free_and_end;
> > }
> > + if (!HAVE_ICONV && avctx->codec_type == AVMEDIA_TYPE_SUBTITLE && avctx->sub_charenc) {
> > + av_log(avctx, AV_LOG_ERROR, "Character encoding subtitles conversion needs "
> > + "a libavcodec built with iconv support.\n");
> > + ret = AVERROR(EINVAL);
>
> ENOSYS? PATCHWELCOME?
>
Patch welcome? I'm not sure I want a fficonv…
OK for ENOSYS, changed locally.
>
> > + goto free_and_end;
> > + }
> > }
> > end:
> > ff_unlock_avcodec();
> > @@ -1816,6 +1825,64 @@ int attribute_align_arg avcodec_decode_audio4(AVCodecContext *avctx,
> > return ret;
> > }
> >
> > +#if HAVE_ICONV
> > +#define UTF8_MAX_BYTES 4 /* 5 and 6 bytes sequences should not be used */
>
> ???
>
> > +static int recode_subtitle(AVCodecContext *avctx,
> > + AVPacket *outpkt, const AVPacket *inpkt)
> > +{
> > + iconv_t cd = (iconv_t)-1;
> > + int ret = 0;
> > + char *inb, *outb;
> > + size_t inl, outl;
> > + AVPacket tmp;
> > +
> > + if (!avctx->sub_charenc) // FIXME add a codec prop for sub codec supporting it
> > + goto end;
> > +
> > + cd = iconv_open("UTF-8", avctx->sub_charenc);
> > + if (cd == (iconv_t)-1) {
> > + av_log(avctx, AV_LOG_ERROR, "Unable to open iconv context "
> > + "with input character encoding \"%s\"\n", avctx->sub_charenc);
> > + ret = AVERROR(errno);
> > + goto end;
> > + }
> > +
> > + inb = inpkt->data;
> > + inl = inpkt->size;
> > +
> > + if (inl >= INT_MAX / UTF8_MAX_BYTES - FF_INPUT_BUFFER_PADDING_SIZE) {
> > + av_log(avctx, AV_LOG_ERROR, "Subtitles packet is too big for recoding\n");
> > + ret = AVERROR(ENOMEM);
> > + goto end;
> > + }
> > +
> > + ret = av_new_packet(&tmp, inl * UTF8_MAX_BYTES);
>
> + FF_INPUT_BUFFER_PADDING_SIZE
>
av_new_packet does it
> > + if (ret < 0)
> > + goto end;
> > + outpkt->data = tmp.data;
> > + outpkt->size = tmp.size;
> > + outb = outpkt->data;
> > + outl = outpkt->size;
> > +
> > + if (iconv(cd, &inb, &inl, &outb, &outl) == (size_t)-1 ||
> > + iconv(cd, NULL, NULL, &outb, &outl) == (size_t)-1 ||
> > + outl >= outpkt->size || inl != 0) {
> > + av_log(avctx, AV_LOG_ERROR, "Unable to recode subtitle event \"%s\" "
> > + "from %s to UTF-8\n", inpkt->data, avctx->sub_charenc);
> > + av_free_packet(&tmp);
> > + ret = AVERROR(errno);
> > + goto end;
> > + }
> > + outpkt->size -= outl;
> > + outpkt->data[outpkt->size - 1] = '\0';
> > +
> > +end:
> > + if (cd != (iconv_t)-1)
> > + iconv_close(cd);
> > + return ret;
> > +}
> > +#endif
> > +
> > int avcodec_decode_subtitle2(AVCodecContext *avctx, AVSubtitle *sub,
> > int *got_sub_ptr,
> > AVPacket *avpkt)
> > @@ -1831,16 +1898,25 @@ int avcodec_decode_subtitle2(AVCodecContext *avctx, AVSubtitle *sub,
> > avcodec_get_subtitle_defaults(sub);
> >
> > if (avpkt->size) {
> > + AVPacket pkt_recoded;
> > AVPacket tmp = *avpkt;
> > int did_split = av_packet_split_side_data(&tmp);
> > //apply_param_change(avctx, &tmp);
> >
> > - avctx->pkt = &tmp;
> > + pkt_recoded = tmp;
>
> > +#if HAVE_ICONV
>
> Maybe put the body of recode_subtitle() under #if HAVE_ICONV, with a #else
> clause "av_assert(!avctx->sub_charenc); return 0;", to have less ifdefry?
>
Good idea, will be done in the next patch.
> > + if (recode_subtitle(avctx, &pkt_recoded, &tmp) < 0)
> > + pkt_recoded = tmp;
>
> Why ignore the error?
>
Because it doesn't sound fatal. And to be honest, because I need to do
larger changes to the code in this case (or use a goto, see the split
thing). It can be done in a later commit.
> > +#endif
> > +
> > + avctx->pkt = &pkt_recoded;
> >
> > if (avctx->pkt_timebase.den && avpkt->pts != AV_NOPTS_VALUE)
> > sub->pts = av_rescale_q(avpkt->pts,
> > avctx->pkt_timebase, AV_TIME_BASE_Q);
> > - ret = avctx->codec->decode(avctx, sub, got_sub_ptr, &tmp);
> > + ret = avctx->codec->decode(avctx, sub, got_sub_ptr, &pkt_recoded);
> > + if (tmp.data != pkt_recoded.data)
>
> > + av_free(pkt_recoded.data);
>
> Why av_free_packet at some places and av_free at others?
>
av_free_packet is used for the temporary packet created in the scope of
recode_subtitle in case of error, where its content is completely
unrelated to the local packet here. Here pkt_recoded is a copy of tmp, so
it shares also stuff like the extradata split, which we don't want to
destroy. The only thing that CAN differ from tmp is it's data/size,
allocated inside recode_subtitle. This is what we destroy here.
[...]
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130104/df33f92e/attachment.asc>
More information about the ffmpeg-devel
mailing list