[FFmpeg-devel] new packed pixel formats (machine vision)

Wed Oct 9 16:10:27 EEST 2024

Hi Lynne,

On Wed, Oct 9, 2024 at 12:52 AM Lynne via ffmpeg-devel
<ffmpeg-devel at ffmpeg.org> wrote:
>
> On 08/10/2024 21:17, Diederick C. Niehorster wrote:
> > Dear Lynne,
> >
> > On Tue, Oct 8, 2024 at 1:11 PM Lynne via ffmpeg-devel
> > <ffmpeg-devel at ffmpeg.org> wrote:
> >
> > Thank you for your quick and helpful answer! However I have several questions.
> >
> >> We have support for AV_PIX_FMT_BAYER_RGGB16, since its a common Bayer
> >> layout that cinema cameras output, so its definitely within the scope
> >> and not some application-specific pixfmt.
> >> RGGB10 is just a bitpacked version.
> >
> > Good!
> >
> >> Unfortunately, we do not directly support 10bit packed pixel formats,
> >> since we can't fit them into our definition, as we only support
> >> byte-aligned formats.
> >
> > Non byte-aligned formats can be represented with
> > AV_PIX_FMT_FLAG_BITSTREAM right? I see AV_PIX_FMT_XV30BE as (the only)
> > example. I am quite possibly misunderstanding.
> > My first example AV_PIX_FMT_BAYER_RGGB10 is byte-aligned by the way,
> > but the problem is that the R and B components would have a depth of
> > 2.5 bits (10/4) in the scheme that ffmpeg uses, so can't be correctly
> > defined. Though i wonder if a rounded value (one up to 3 other down to
> > 2) is the solution here, since these are only informative (correct?)
> > and 3+5+2=10 so would be correct for this 10bit format.
>
> Nope, AV_PIX_FMT_FLAG_BITSTREAM is for a very special case where all
> components are aligned and repeat on a 32-bits basis.
> If using it was an option, we wouldn't have bitpacked_enc or v210enc/dec.

Fair enough, makes sense!

> >> We treat those as codecs, for example AV_CODEC_ID_V210 and
> >> AV_CODEC_ID_BITPACKED.
> >> The format would have to be implemented as a codec, with a decoder that
> >> accepts AV_CODEC_ID_RGGB10 and outputs AV_PIX_FMT_RGGB16, setting
> >> avctx->bits_per_sample to 10 to indicate how many bits are used.
> >
> > Hmm, but how would that work? If i understand correctly, I would
> > package the raw image data in AVPackets and use the decoder I'd write
> > to turn them into AVFrames, that i can then use as i wish.
> > That is a lot more complicated than adding these as pixel formats and
> > having swscale support them as an input format, since then I could
> > directly package the video data in an AVFrame and benefit from auto
> > conversion insertion during format negotiation and feed these new
> > pixel formats into anything without needing to special case with the
> > extra decoder in between.
>
> That is how it must be. Unless you want to refactor swscale and our
> entire codebase to allow such formats, which would be a lot more work.

I'd like to explore this, since it is important enough that i could do
the work. Of course any design should be done such that the majority
of the code base (that is, all the existing usages of existing pixel
formats) is unaffected. That means that the AVPixFmtDescriptor should
be extended in a backwards-compatible way to accommodate the new
formats.
Here a first attempt, mostly to get the discussion going.

First to cover all the use cases, two new flags would be needed. Their
usage will be shown later below
// new flag to denote that indicated bitdepths should be divided by
100, because effective bitdepth is fractional
AV_PIX_FMT_FLAG_DEPTH_FRACTION100
then for RGGB10, the component bitdepths can be stored as 25, 50, 25,
and the flag indictating that these values should be divided by 100
when interpreting.
This allows storing many reasonable values, but is not the most
flexible. An alternative would be a flag
AV_PIX_FMT_FLAG_DEPTH_INT16_RATIONAL
And pack a 16 bit numerator and denominator into the bitdepth int.
That can store any value that could ever be possible. I think this is
the preferable solution.

Second, a flag is needed to indicate that this is a bitpacked format
(implying its not byte-aligned):
AV_PIX_FMT_FLAG_BITPACKED_NONALIGNED // indicates formats that are
bit-wise packed in a way that is not aligned on 1, 2 or 4 bytes (e.g.
4 10-bit values in 5 bytes)

Here then some example new pixel formats:
    [AV_PIX_FMT_BAYER_RGGB10] = {
        .name = "bayer_rggb10",
        .nb_components = 3,
        .log2_chroma_w = 0,
        .log2_chroma_h = 0,
        .comp = {
            { 0, 2, 0, 0, 655364 },  /* 2.5: 10/4 (10<<16 + 2) */
            { 0, 2, 0, 0, 655362 },  /* 5: 10/2 */
            { 0, 2, 0, 0, 655364 },  /* 2.5: 10/4 */
        },
        .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER |
AV_PIX_FMT_FLAG_DEPTH_INT16_RATIONAL,
    },
    [AV_PIX_FMT_BAYER_RGGB12] = {
        .name = "bayer_rggb12",
        .nb_components = 3,
        .log2_chroma_w = 0,
        .log2_chroma_h = 0,
        .comp = {
            { 0, 2, 0, 0, 3 },
            { 0, 2, 0, 0, 6 },
            { 0, 2, 0, 0, 3 },
        },
        .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER,
    },
    [AV_PIX_FMT_BAYER_GRAY10P] = {
        .name = "gray10p",
        .nb_components = 1,
        .log2_chroma_w = 0,
        .log2_chroma_h = 0,
        .comp = {
            { 0, 2, 0, 0, 10 },       /* Y */
        },
        .flags = AV_PIX_FMT_FLAG_BITPACKED_NONALIGNED,
    },
    [AV_PIX_FMT_BAYER_RGGB10P] = {
        .name = "bayer_rggb10p",
        .nb_components = 3,
        .log2_chroma_w = 0,
        .log2_chroma_h = 0,
        .comp = {
            { 0, 2, 0, 0, 655364 },  /* 2.5: 10/4 (10<<16 + 2) */
            { 0, 2, 0, 0, 655362 },  /* 5: 10/2 */
            { 0, 2, 0, 0, 655364 },  /* 2.5: 10/4 */
        },
        .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER |
AV_PIX_FMT_FLAG_DEPTH_INT16_RATIONAL |
AV_PIX_FMT_FLAG_BITPACKED_NONALIGNED,
    },
    [AV_PIX_FMT_BAYER_RGGB12P] = {
        .name = "bayer_rggb12p",
        .nb_components = 3,
        .log2_chroma_w = 0,
        .log2_chroma_h = 0,
        .comp = {
            { 0, 2, 0, 0, 3 },
            { 0, 2, 0, 0, 6 },
            { 0, 2, 0, 0, 3 },
        },
        .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER |
AV_PIX_FMT_FLAG_BITPACKED_NONALIGNED,
    },

The latter three are tricky, they need some further computation to be
fully described.
That is, they require solving for what n n*sum(component_bit_depths)
equals a multiple of 8. So, for the above three:
gray10p: sum(component_bit_depths)=10: 4*10==5*8 -> 4 values in 5 bytes
bayer_rggb10p: sum(component_bit_depths)=10: 4*10==5*8 -> 4 values in 5 bytes
bayer_rggb12p: sum(component_bit_depths)=12: 2*12==3*8 -> 2 values in 3 bytes

So i think that these pixel formats can be fully described by the
existing AVPixFmtDescriptor and two new flags.

I have not thought about whether this would also allow turning v210
into a pixel format and deprecating the encoder (presumably a good
thing), or whether this scheme then runs into a limitation.

Looking forward to hearing what you/the list think!

All the best,
Dee