[FFmpeg-devel] [PATCH] lavc: support subtitles charset conversion.

Fri Jan 4 11:35:51 CET 2013

On Fri, Jan 04, 2013 at 11:20:12AM +0100, Nicolas George wrote:
> Le quintidi 15 nivôse, an CCXXI, Clement Boesch a écrit :
> > I was wondering what in my patch was preventing any code evolution, so my
> > question was about any interface with the user.
> > 
> > Now that the character re-encoding is done before decoding, do you see any
> > design problem with it?
> 
> Well, you answer your own question:
> 
> > Note about the FIXME in the patch: we should add some kind of flags to the
> > codec because the recoding will obviously have unwanted behaviour with
> > bitmap sub packets.
> 
> I had not thought of that. You are perfectly right.
> 

I think Wolfram is working on something like this. But I may send a patch
anyway soon (sorry).

> If we go ahead with the mode field I suggested, then the solution would
> probably be to let the default be "do nothing", or even maybe "croak if the
> encoding is set"; adding "if (!mode) mode = PRE" in all text decoders init
> is not much work, and no more than adding flags on the codecs.
> 
> OTOH, knowing what are text subtitles codecs and what are bitmap subtitles
> codecs would be useful not only there, so a codec flags would probably be
> nice too.
> 

Adding a "support charset recoding" codec description property sounds like
the most simple and reliable thing to do.

> More generally, this issue is an example of why this should not be rushed:
> this is a tricky issue with a lot of pitfalls. I believe each time a new
> pitfall is found and avoided, it should reset the forethought time.
> 
> Now, for the technical comments:
> 
[...]
> >  volatile int ff_avcodec_locked;
> >  static int volatile entangled_thread_counter = 0;
> > @@ -1058,6 +1061,12 @@ int attribute_align_arg avcodec_open2(AVCodecContext *avctx, const AVCodec *code
> >              ret = AVERROR(EINVAL);
> >              goto free_and_end;
> >          }
> > +        if (!HAVE_ICONV && avctx->codec_type == AVMEDIA_TYPE_SUBTITLE && avctx->sub_charenc) {
> > +            av_log(avctx, AV_LOG_ERROR, "Character encoding subtitles conversion needs "
> > +                   "a libavcodec built with iconv support.\n");
> > +            ret = AVERROR(EINVAL);
> 
> ENOSYS? PATCHWELCOME?
> 

Patch welcome? I'm not sure I want a fficonv…

OK for ENOSYS, changed locally.

> 
> > +            goto free_and_end;
> > +        }
> >      }
> >  end:
> >      ff_unlock_avcodec();
> > @@ -1816,6 +1825,64 @@ int attribute_align_arg avcodec_decode_audio4(AVCodecContext *avctx,
> >      return ret;
> >  }
> >  
> > +#if HAVE_ICONV
> > +#define UTF8_MAX_BYTES 4 /* 5 and 6 bytes sequences should not be used */
> 
> ???
> 
> > +static int recode_subtitle(AVCodecContext *avctx,
> > +                           AVPacket *outpkt, const AVPacket *inpkt)
> > +{
> > +    iconv_t cd = (iconv_t)-1;
> > +    int ret = 0;
> > +    char *inb, *outb;
> > +    size_t inl, outl;
> > +    AVPacket tmp;
> > +
> > +    if (!avctx->sub_charenc) // FIXME add a codec prop for sub codec supporting it
> > +        goto end;
> > +
> > +    cd = iconv_open("UTF-8", avctx->sub_charenc);
> > +    if (cd == (iconv_t)-1) {
> > +        av_log(avctx, AV_LOG_ERROR, "Unable to open iconv context "
> > +               "with input character encoding \"%s\"\n", avctx->sub_charenc);
> > +        ret = AVERROR(errno);
> > +        goto end;
> > +    }
> > +
> > +    inb = inpkt->data;
> > +    inl = inpkt->size;
> > +
> > +    if (inl >= INT_MAX / UTF8_MAX_BYTES - FF_INPUT_BUFFER_PADDING_SIZE) {
> > +        av_log(avctx, AV_LOG_ERROR, "Subtitles packet is too big for recoding\n");
> > +        ret = AVERROR(ENOMEM);
> > +        goto end;
> > +    }
> > +
> > +    ret = av_new_packet(&tmp, inl * UTF8_MAX_BYTES);
> 
> + FF_INPUT_BUFFER_PADDING_SIZE
> 

av_new_packet does it

> > +    if (ret < 0)
> > +        goto end;
> > +    outpkt->data = tmp.data;
> > +    outpkt->size = tmp.size;
> > +    outb = outpkt->data;
> > +    outl = outpkt->size;
> > +
> > +    if (iconv(cd, &inb, &inl, &outb, &outl) == (size_t)-1 ||
> > +        iconv(cd, NULL, NULL, &outb, &outl) == (size_t)-1 ||
> > +        outl >= outpkt->size || inl != 0) {
> > +        av_log(avctx, AV_LOG_ERROR, "Unable to recode subtitle event \"%s\" "
> > +               "from %s to UTF-8\n", inpkt->data, avctx->sub_charenc);
> > +        av_free_packet(&tmp);
> > +        ret = AVERROR(errno);
> > +        goto end;
> > +    }
> > +    outpkt->size -= outl;
> > +    outpkt->data[outpkt->size - 1] = '\0';
> > +
> > +end:
> > +    if (cd != (iconv_t)-1)
> > +        iconv_close(cd);
> > +    return ret;
> > +}
> > +#endif
> > +
> >  int avcodec_decode_subtitle2(AVCodecContext *avctx, AVSubtitle *sub,
> >                               int *got_sub_ptr,
> >                               AVPacket *avpkt)
> > @@ -1831,16 +1898,25 @@ int avcodec_decode_subtitle2(AVCodecContext *avctx, AVSubtitle *sub,
> >      avcodec_get_subtitle_defaults(sub);
> >  
> >      if (avpkt->size) {
> > +        AVPacket pkt_recoded;
> >          AVPacket tmp = *avpkt;
> >          int did_split = av_packet_split_side_data(&tmp);
> >          //apply_param_change(avctx, &tmp);
> >  
> > -        avctx->pkt = &tmp;
> > +        pkt_recoded = tmp;
> 
> > +#if HAVE_ICONV
> 
> Maybe put the body of recode_subtitle() under #if HAVE_ICONV, with a #else
> clause "av_assert(!avctx->sub_charenc); return 0;", to have less ifdefry?
> 

Good idea, will be done in the next patch.

> > +        if (recode_subtitle(avctx, &pkt_recoded, &tmp) < 0)
> > +            pkt_recoded = tmp;
> 
> Why ignore the error?
> 

Because it doesn't sound fatal. And to be honest, because I need to do
larger changes to the code in this case (or use a goto, see the split
thing). It can be done in a later commit.

> > +#endif
> > +
> > +        avctx->pkt = &pkt_recoded;
> >  
> >          if (avctx->pkt_timebase.den && avpkt->pts != AV_NOPTS_VALUE)
> >              sub->pts = av_rescale_q(avpkt->pts,
> >                                      avctx->pkt_timebase, AV_TIME_BASE_Q);
> > -        ret = avctx->codec->decode(avctx, sub, got_sub_ptr, &tmp);
> > +        ret = avctx->codec->decode(avctx, sub, got_sub_ptr, &pkt_recoded);
> > +        if (tmp.data != pkt_recoded.data)
> 
> > +            av_free(pkt_recoded.data);
> 
> Why av_free_packet at some places and av_free at others?
> 

av_free_packet is used for the temporary packet created in the scope of
recode_subtitle in case of error, where its content is completely
unrelated to the local packet here. Here pkt_recoded is a copy of tmp, so
it shares also stuff like the extradata split, which we don't want to
destroy. The only thing that CAN differ from tmp is it's data/size,
allocated inside recode_subtitle. This is what we destroy here.

[...]

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130104/df33f92e/attachment.asc>