[FFmpeg-devel] (no subject)
Niklas Haas
ffmpeg at haasn.xyz
Tue May 27 11:51:53 EEST 2025
On Tue, 27 May 2025 16:29:20 +0800 Kieran Kunhya via ffmpeg-devel <ffmpeg-devel at ffmpeg.org> wrote:
> >
> > - adding vzeroupper: ~12%
> >
>
> This seems quite suspicious.
> Can you explain what you are doing here?
I added a vzeroupper call whenever the code transitions from AVX to SSE. For
example:
Conversion pass for yuv444p -> rgba:
[ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) planar >> 0
[ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32
[f32 ...X -> ...X] SWS_OP_LINEAR : matrix3+off3 [[85/73 0 1.596027 0 -222.921566] [85/73 -0.391762 -0.812968 0 135.575295] [85/73 2.017232 0 0 -276.835851] [0 0 0 1 0]]
[f32 ...X -> ...X] SWS_OP_DITHER : 16x16 matrix
[f32 ...X -> ...X] SWS_OP_MAX : {0 0 0 0} <= x
[f32 ...X -> ...X] SWS_OP_MIN : x <= {255 255 255 255}
[f32 ...X -> +++X] SWS_OP_CONVERT : f32 -> u8
^-------- vzeroupper call added here
[ u8 ...X -> ++++] SWS_OP_CLEAR : {_ _ _ 255}
[ u8 .... -> ++++] SWS_OP_WRITE : 4 elem(s) packed >> 0
yuv444p 1920x1080 -> rgba 1920x1080, flags=0x100000 dither=1, SSIM {Y=1.000000 U=0.999999 V=0.999997 A=1.000000}
time=911 us, ref=4257 us, speedup=4.669x faster
With the vzeroupper commented out:
yuv444p 1920x1080 -> rgba 1920x1080, flags=0x100000 dither=1, SSIM {Y=1.000000 U=0.999999 V=0.999997 A=1.000000}
time=1361 us, ref=4265 us, speedup=3.133x faster
In most other cases, it does not matter, but in some cases like here, not
having the vzeroupper call introduces false dependencies.
Another example is grayf32 -> yuv444p, which goes from 268 us to 296 us if I
remove the vzeroupper calls. In general, anything involving switching between
32-bit floats (512 bits per block) and 8-bit integers (128 bits per block)
sees an effect.
>
> Kieran
>
> >
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list