[FFmpeg-devel] [PATCH] swscale/slice: fix init of 32 bpc planes

Mon Dec 16 12:40:50 EET 2024

On Mon, 16 Dec 2024 01:50:20 +0100 Michael Niedermayer <michael at niedermayer.cc> wrote:
> Hi Niklas
>
> On Wed, Dec 11, 2024 at 09:25:12AM +0100, Niklas Haas wrote:
> > From: Niklas Haas <git at haasn.dev>
> >
> > In input.c and output.c and many other places, swscale follows the rule of using
> > 15-bit intermediate if output bpc is <= 8, and 19-bit (inside int32_t)
> > intermediate otherwise. See e.g. the comments on hyScale() on
> > swscale_internal.h. These are also the coefficients that yuv2gbrpf32_full_X_c()
> > is using.
> >
> > In contrast to this, the plane init code in slice.c (function fill_ones) is
> > assuming that we use 35-bit intermediates (inside 64-bit integers) for this
> > case, seemingly added by commit b4967fc71c63eae8cd96f9c46cd3e1fbd705bbf9 with
> > no further justification.
> >
> > This causes a mismatch whenever the implicitly initialized plane contents leak
> > out to the output, e.g. when converting from grayscale to RGB.
> >
> > Fixes: ticket #10716
> > Signed-off-by: Niklas Haas <git at haasn.dev>
> > Sponsored-by: Sovereign Tech Fund
> > ---
> >  libswscale/slice.c | 6 +-----
> >  1 file changed, 1 insertion(+), 5 deletions(-)
>
> ultimately 32bit on teh input or output side require more than 32bit
> internally to maintain precission.
>
> if this patch makes it all match up, its ok for now but 18bit dont seem
> enough for 32bit data

What exactly is the design goal here? As a point of reference, here is how many
bits you need to represent a signal to within an error that is substantially
below the threshold of human perception in all but the most extreme synthetic
test setups:

- HDR before linearization: 14 bits
- HDR after linearization:  35 bits
- SDR before linearization: 12 bits
- SDR after linearization:  26 bits

If we relax this to the threshold that is required for visual transparency in
(non-static) video playback, we get:

- HDR before linearization: 12 bits
- HDR after linearization:  30 bits
- SDR before linearization: 10 bits
- SDR after linearization:  22 bits

Given that swscale currently does not linearize input data except in the case
of XYZ (which I want to handle differently in the near future), it's safe to
say that 18 bits of precision is more than enough to faithfully represent
any sort of non-synthetic data, as will as to capture the dynamic range of
any camera that currently exists.

If we instead want to be able to losslessy roundtrip the input data *exactly*,
then the only solution would be to also switch to a floating point internal
format, rather than upping the bit depth. Even 64 bits fixed point is by far
not enough to accurately round-trip a 32f input.

The only case that can be made for a 64-bit integer format is if we want to
start linearizing HDR input images, and for some reason determine that 64-bit
fixed point math is faster than 32-bit floating point math, which would be the
most obvious candidate of intermediate format for linearized content.

>
> thx
>
>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> What is money laundering? Its paying someone and not telling the government.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".