[FFmpeg-devel] [RFC] Support multiple frames in a singe AVPacket in avcodec_decode_subtitle2

Wed Oct 16 20:56:10 CEST 2013

Hi,

On Wed, Oct 16, 2013 at 2:48 PM, wm4 <nfxjfg at googlemail.com> wrote:

> On Wed, 16 Oct 2013 14:15:51 -0400
> "Ronald S. Bultje" <rsbultje at gmail.com> wrote:
>
> > Hi,
> >
> > On Wed, Oct 16, 2013 at 1:38 PM, wm4 <nfxjfg at googlemail.com> wrote:
> >
> > > On Tue, 1 Oct 2013 23:16:01 +0200 (CEST)
> > > Marton Balint <cus at passwd.hu> wrote:
> > >
> > > > Hi,
> > > >
> > > > When I implemented the DVB teletext decoder, I faced a problem: If
> > > > multiple teletext pages are in a single teletext packet, the decoder
> has
> > > > no way to return multiple AVSubtitles. So the current decoder only
> return
> > > > one AVSubtitle in that case, an AVSubtitle containing the first
> decoded
> > > > page from the packet.
> > > >
> > > > This is not a problem if the user wants to decode only a single
> teletext
> > > > page (subtitle page), because the same page is not sent twice in a
> single
> > > > packet. However, if somebody wants to decode all pages, he probably
> won't
> > > > be able to do so without losing a page here or there.
> > > >
> > > > I could have split the teletext PES packets (usually around 1472
> bytes)
> > > at
> > > > the demuxer level to 46-byte packets to overcome this, but I thought
> it
> > > > would be much better to extend the API the same way it is used now
> for
> > > > audio decoding, where a single packet can contain multiple frames.
> > > >
> > > > If I combine this with CODEC_CAP_DELAY, the teletext decoder can
> store
> > > the
> > > > remaining pages of a teletext packet (unfortunately libzvbi parses
> all
> > > > pages in the packet in a single pass), and return them to the user
> on the
> > > > next call to avcodec_decode_subtitle2. In that case the decoder
> obviously
> > > > would not consume anything from the next packet until its buffer
> > > > containing teletext pages from the previous packet is not empty.
> > > >
> > > > If we do this, we will have to make sure that the current subtitle
> > > > decoders will always return the full buffer size as the number of
> > > consumed
> > > > bytes. I've checked, and it seems that only 3 decoders are
> problematic,
> > > > but they only need a one-line patch to fix them. Movtext (patch is
> > > already
> > > > on the mailing list), srtdec and dvbsub are the three.
> > > >
> > > > So, what do you think?
> > >
> > > Sounds like a bad idea.
> > >
> > > First, this kind of partial packet decoding seems to be in decline in
> > > ffmpeg. Video doesn't use it anymore, audio uses it only for some
> > > obscure formats (hopefully one day it won't require this anymore). It's
> > > also additional pain for the user to keep around a packet and to slice
> > > it. This is pretty unintuitive API and increases the amount of
> > > boilerplate needed to decode something. It's also not entirely robust
> > > and foolproof. And now you want to introduce a new API which uses this
> > > API anti-pattern?
> > >
> > > Second, the API is in need for a better design. AVSubtitle still sucks,
> > > and I'm very doubtful about how subtitle->ASS conversion is done. I
> > > think the next iteration of the subtitle API should fix this, and not
> > > just be another shot in the dark just to make teletext work for now.
> > >
> > > Are you sure there's no better way to shoehorn proper teletext decoding
> > > into ffmpeg?
> >
> >
> > Video and audio are different in that the subpackets for e.g. voice audio
> > are in the realm of several tens of bytes (e.g. 50 byte), which means the
> > (memory/cpu cycle) overhead of giving each packet its own AVPacket
> > container would be highly disproportionate. For video, packet size is
> > several orders of magnitude more than that, so the tradeoff is entirely
> > different between the two - hence the expected optimal (and therefore
> > proposed) solution is different.
>
> True. Though it seems that often audio is split into subpackets by
> libavformat (or even the container) anyway.
>
> Is there any reason why avcodec_decode_audio4 can't decode all
> subpackets at once, instead of having the user do repeated calls? Since
> each decode call produce an AVFrame, I figure this would be more
> efficient in general (for the same reasons as you cited).

I don't remember there being any specific reason against that, I suppose it
was simply "one call to avcodec_decode_audioX represents one call to
AVCodec->decode()". I don't have any strong objections, might indeed be the
best of both worlds.

Ronald