[FFmpeg-devel] Enhancement layers in FFmpeg

Hendrik Leppkes h.leppkes at gmail.com
Mon Aug 1 16:45:19 EEST 2022


On Mon, Aug 1, 2022 at 1:25 PM Niklas Haas <ffmpeg at haasn.xyz> wrote:
>
> Hey,
>
> We need to think about possible ways to implement reasonably-transparent
> support for enhancement layers in FFmpeg. (SVC, Dolby Vision, ...).
> There are more open questions than answers here.
>
> From what I can tell, these are basically separate bitstreams that carry
> some amount of auxiliary information needed to reconstruct the
> high-quality bitstream. That is, they are not independent, but need to
> be merged with the original bitstream somehow.
>
> How do we architecturally fit this into FFmpeg? Do we define a new codec
> ID for each (common/relevant) combination of base codec and enhancement
> layer, e.g. HEVC+DoVi, H.264+SVC, ..., or do we transparently handle it
> for the base codec ID and control it via a flag? Do the enhancement
> layer packets already make their way to the codec, and if not, how do we
> ensure that this is the case?

EL on Blu-rays are a separate stream, so that would need to be handled
in some fashion. Unless it wouldn't. See below.

>
> Can the decoder itself recursively initialize a sub-decoder for the
> second bitstream? And if so, does the decoder apply the actual
> transformation, or does it merely attach the EL data to the AVFrame
> somehow in a way that can be used by further filters or end users?

My main question is, how closely related are those streams?
I know that Dolby EL can be decoded basically entirely separately from
the main video stream. But EL might be the special case here. I have
no experience with SVC.

If the enhancement layer is entirely independent, like Dolby EL,
should avcodec need to do anything? It _can_ decode the stream today,
a user-application could write code that decodes both the main stream
and the EL stream and links them together, without any changes in
avcodec.
Do we need to complicate this situation by forcing this into avcodec?

Decoding them in entirely separate decoder instances has the advantage
of being able to use Hardware for the main one, software for the EL,
or both in hardware, or whatever one prefers.

Of course this applies to the special situation of Dolby EL which is
entirely independent, at least in its primary source - Blu-ray. I
think MKV might mix both into one stream, which is an unfortunate
design decision on their part.

avfilter for example is already setup to synchronize two incoming
streams (for eg. overlay), so the same mechanic could be used to pass
it to a processing filter.

>
> (What about the case of Dolby Vision, which iirc requires handling the
> DoVi RPU metadata before the EL can be applied? What about instances
> where the user wants the DoVi/EL application to happen on GPU, e.g. via
> libplacebo in mpv/vlc?)
>

Yes, processing should be left to dedicated filters.

> How does this metadata need to be attached? A second AVFrame reference
> inside the AVFrame? Raw data in a big side data struct?

For Dolby EL, no attachment is necessary if we follow the above
concept of just not having avcodec care.

- Hendrik


More information about the ffmpeg-devel mailing list