[FFmpeg-devel] [PATCH v3 1/7] swscale/range_convert: saturate output instead of limiting input

Sun Dec 1 15:21:46 EET 2024

On Sun, Dec 01, 2024 at 02:39:19AM +0100, Michael Niedermayer wrote:
> Hi Ramiro
> 
> On Sat, Nov 30, 2024 at 04:23:36PM +0100, Ramiro Polla wrote:
> > For bit depths <= 14, the result is saturated to 15 bits.
> > For bit depths > 14, the result is saturated to 19 bits.
> > 
> > x86_64:
> > chrRangeFromJpeg8_1920_c:    5827.4   5804.5  ( 1.00x)
> > chrRangeFromJpeg16_1920_c:   5793.2   5792.8  ( 1.00x)
> > chrRangeToJpeg8_1920_c:     11726.2   9388.6  ( 1.25x)
> > chrRangeToJpeg16_1920_c:    10610.8   5796.5  ( 1.83x)
> > lumRangeFromJpeg8_1920_c:    4165.7   4147.9  ( 1.00x)
> > lumRangeFromJpeg16_1920_c:   4530.0   4529.0  ( 1.00x)
> > lumRangeToJpeg8_1920_c:      6044.8   5694.1  ( 1.06x)
> > lumRangeToJpeg16_1920_c:     5343.6   5334.2  ( 1.00x)
> > 
> > aarch64 A55:
> > chrRangeFromJpeg8_1920_c:   28839.3  28833.8  ( 1.00x)
> > chrRangeFromJpeg16_1920_c:  28843.8  28842.8  ( 1.00x)
> > chrRangeToJpeg8_1920_c:     44196.1  23070.6  ( 1.92x)
> > chrRangeToJpeg16_1920_c:    36526.7  17313.8  ( 2.11x)
> > lumRangeFromJpeg8_1920_c:   15384.3  15388.1  ( 1.00x)
> > lumRangeFromJpeg16_1920_c:  15390.1  15388.0  ( 1.00x)
> > lumRangeToJpeg8_1920_c:     23066.7  19226.2  ( 1.20x)
> > lumRangeToJpeg16_1920_c:    19224.6  19225.5  ( 1.00x)
> > 
> > aarch64 A76:
> > chrRangeFromJpeg8_1920_c:    6316.2   6317.8  ( 1.00x)
> > chrRangeFromJpeg16_1920_c:   6321.9   6322.9  ( 1.00x)
> > chrRangeToJpeg8_1920_c:     11389.3   9287.1  ( 1.23x)
> > chrRangeToJpeg16_1920_c:     9514.4   6104.9  ( 1.56x)
> > lumRangeFromJpeg8_1920_c:    4376.0   4359.1  ( 1.00x)
> > lumRangeFromJpeg16_1920_c:   4437.9   4358.8  ( 1.02x)
> > lumRangeToJpeg8_1920_c:      6667.0   5957.2  ( 1.12x)
> > lumRangeToJpeg16_1920_c:     6062.5   6072.5  ( 1.00x)
> > 
> > NOTE: all simd optimizations for range_convert have been disabled
> >       except for x86, which already had the same behaviour.
> >       they will be re-enabled when they are fixed for each architecture.
> > ---
> >  libswscale/aarch64/swscale.c                  |  5 +++++
> >  libswscale/loongarch/swscale_init_loongarch.c |  5 +++++
> >  libswscale/riscv/swscale.c                    |  5 +++++
> >  libswscale/swscale.c                          | 21 ++++++++++++-------
> >  libswscale/x86/range_convert.asm              |  3 ---
> >  5 files changed, 29 insertions(+), 10 deletions(-)
> 
> [...]
> 
> > @@ -160,8 +160,10 @@ static void chrRangeToJpeg_c(int16_t *dstU, int16_t *dstV, int width)
> >  {
> >      int i;
> >      for (i = 0; i < width; i++) {
> > -        dstU[i] = (FFMIN(dstU[i], 30775) * 4663 - 9289992) >> 12; // -264
> > -        dstV[i] = (FFMIN(dstV[i], 30775) * 4663 - 9289992) >> 12; // -264
> > +        int U = (dstU[i] * 4663 - 9289992) >> 12; // -264
> > +        int V = (dstV[i] * 4663 - 9289992) >> 12; // -264
> 
> The way this is written it triggers undefined behavior if the input to teh function
> is too large

I misread the code somehow, the FFMIN only protects the 16bit output which the
new code does too, so teh chaneg is ok

thx

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Rewriting code that is poorly written but fully understood is good.
Rewriting code that one doesnt understand is a sign that one is less smart
than the original author, trying to rewrite it will not make it better.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20241201/5ce617ee/attachment.sig>