[FFmpeg-devel] [PATCH v3 1/7] swscale/range_convert: saturate output instead of limiting input
Michael Niedermayer
michael at niedermayer.cc
Sun Dec 1 15:21:46 EET 2024
On Sun, Dec 01, 2024 at 02:39:19AM +0100, Michael Niedermayer wrote:
> Hi Ramiro
>
> On Sat, Nov 30, 2024 at 04:23:36PM +0100, Ramiro Polla wrote:
> > For bit depths <= 14, the result is saturated to 15 bits.
> > For bit depths > 14, the result is saturated to 19 bits.
> >
> > x86_64:
> > chrRangeFromJpeg8_1920_c: 5827.4 5804.5 ( 1.00x)
> > chrRangeFromJpeg16_1920_c: 5793.2 5792.8 ( 1.00x)
> > chrRangeToJpeg8_1920_c: 11726.2 9388.6 ( 1.25x)
> > chrRangeToJpeg16_1920_c: 10610.8 5796.5 ( 1.83x)
> > lumRangeFromJpeg8_1920_c: 4165.7 4147.9 ( 1.00x)
> > lumRangeFromJpeg16_1920_c: 4530.0 4529.0 ( 1.00x)
> > lumRangeToJpeg8_1920_c: 6044.8 5694.1 ( 1.06x)
> > lumRangeToJpeg16_1920_c: 5343.6 5334.2 ( 1.00x)
> >
> > aarch64 A55:
> > chrRangeFromJpeg8_1920_c: 28839.3 28833.8 ( 1.00x)
> > chrRangeFromJpeg16_1920_c: 28843.8 28842.8 ( 1.00x)
> > chrRangeToJpeg8_1920_c: 44196.1 23070.6 ( 1.92x)
> > chrRangeToJpeg16_1920_c: 36526.7 17313.8 ( 2.11x)
> > lumRangeFromJpeg8_1920_c: 15384.3 15388.1 ( 1.00x)
> > lumRangeFromJpeg16_1920_c: 15390.1 15388.0 ( 1.00x)
> > lumRangeToJpeg8_1920_c: 23066.7 19226.2 ( 1.20x)
> > lumRangeToJpeg16_1920_c: 19224.6 19225.5 ( 1.00x)
> >
> > aarch64 A76:
> > chrRangeFromJpeg8_1920_c: 6316.2 6317.8 ( 1.00x)
> > chrRangeFromJpeg16_1920_c: 6321.9 6322.9 ( 1.00x)
> > chrRangeToJpeg8_1920_c: 11389.3 9287.1 ( 1.23x)
> > chrRangeToJpeg16_1920_c: 9514.4 6104.9 ( 1.56x)
> > lumRangeFromJpeg8_1920_c: 4376.0 4359.1 ( 1.00x)
> > lumRangeFromJpeg16_1920_c: 4437.9 4358.8 ( 1.02x)
> > lumRangeToJpeg8_1920_c: 6667.0 5957.2 ( 1.12x)
> > lumRangeToJpeg16_1920_c: 6062.5 6072.5 ( 1.00x)
> >
> > NOTE: all simd optimizations for range_convert have been disabled
> > except for x86, which already had the same behaviour.
> > they will be re-enabled when they are fixed for each architecture.
> > ---
> > libswscale/aarch64/swscale.c | 5 +++++
> > libswscale/loongarch/swscale_init_loongarch.c | 5 +++++
> > libswscale/riscv/swscale.c | 5 +++++
> > libswscale/swscale.c | 21 ++++++++++++-------
> > libswscale/x86/range_convert.asm | 3 ---
> > 5 files changed, 29 insertions(+), 10 deletions(-)
>
> [...]
>
> > @@ -160,8 +160,10 @@ static void chrRangeToJpeg_c(int16_t *dstU, int16_t *dstV, int width)
> > {
> > int i;
> > for (i = 0; i < width; i++) {
> > - dstU[i] = (FFMIN(dstU[i], 30775) * 4663 - 9289992) >> 12; // -264
> > - dstV[i] = (FFMIN(dstV[i], 30775) * 4663 - 9289992) >> 12; // -264
> > + int U = (dstU[i] * 4663 - 9289992) >> 12; // -264
> > + int V = (dstV[i] * 4663 - 9289992) >> 12; // -264
>
> The way this is written it triggers undefined behavior if the input to teh function
> is too large
I misread the code somehow, the FFMIN only protects the 16bit output which the
new code does too, so teh chaneg is ok
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Rewriting code that is poorly written but fully understood is good.
Rewriting code that one doesnt understand is a sign that one is less smart
than the original author, trying to rewrite it will not make it better.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20241201/5ce617ee/attachment.sig>
More information about the ffmpeg-devel
mailing list