[FFmpeg-devel] How to support audio data placed within video data

Sat Nov 30 14:51:36 EET 2024

On Sat, Nov 30, 2024 at 9:50 AM Anton Khirnov <anton at khirnov.net> wrote:
>
> Quoting Manuel Lauss (2024-11-28 21:58:09)
> > On Thu, Nov 28, 2024 at 3:19 PM Anton Khirnov <anton at khirnov.net> wrote:
> > >
> > > Quoting Manuel Lauss (2024-11-26 15:25:30)
> > > > Hello,
> > > >
> > > > I'd like to add some audio support for the old libavformat/smush
> > > > formats (mainly the "ANIM" variants; the "SANM" variant already has
> > > > audio decoding support).
> > > >
> > > > The audio data (16bit stereo PCM) however is placed at (more or less)
> > > > random places within all the video data, also with no relation to the
> > > > actual video frame it is embedded into (i.e. most files place a few
> > > > hundred ms of audio in the first video frame, while the rest are
> > > > roughly the length of a video frame).
> > > >
> > > > What is the best way to support this scenario?
> > >
> > > Meaning you have to parse the coded bytestream to get the audio? Is
> > > there at least some signalling that audio is present at al?
> >
> > No, audio and video are distinct chunks which are again contained
> > in a super chunk. This super chunk (a "FRME" since it encodes exactly
> > one video frame) is passed by smush.c as a video packet to sanm.c.
> > It make some sense since besides pure video/audio data it also
> > contains (delta-)palette data, instructions to store/restore a frame,
> > subtitles for this video frame, ... and also a few kB of audio which
> > is not tied (on the timeline) to the video frame also in this super chunk,
> > and it's not even a full standalone audio packet either, it may depend
> > on data from the previous FRME and also provide a few bytes of data
> > for the following one.
> > (i think this was designed for streaming from slow cdroms).
> >
> > The end of one super chunk then signals to display the video/audio data
> > worked on so far.
>
> Then it seems preferable to have the demuxer extract the audio.
>
> > > The options I can think of are:
> > > * parse the bytestream in the demuxer
> >
> > So pass the individual chunks of the super chunk off to the
> > video/audio codec as packets?  Can I invoke the codecs
> > decode function multiple times per frame?
>
> In principle yes, but we generally prefer (when feasible) for a packet
> to contain exactly one frame. Though I don't quite see why that should
> be needed - you're saying above that the super chunk contains exactly
> one video frame.

Both video and audio can have an arbitrary number of a/v related chunks
arraged in random order, in the super chunk.

Oh, can I pass the same AVPacket that is passed to the video decoder
also to an audio decoder? basically processing the Frame twice: once
for video parts, and once for audio parts?

Thanks!
      Manuel