[FFmpeg-devel] [PATCH 09/11] avutil/half2float: use native _Float16 if available
Timo Rothenpieler
timo at rothenpieler.org
Thu Aug 11 00:58:44 EEST 2022
On 10.08.2022 23:03, Andreas Rheinhardt wrote:
> Timo Rothenpieler:
>> _Float16 support was available on arm/aarch64 for a while, and with gcc
>> 12 was enabled on x86 as long as SSE2 is supported.
>>
>> If the target arch supports f16c, gcc emits fairly efficient assembly,
>> taking advantage of it. This is the case on x86-64-v3 or higher.
>> Without f16c, it emulates it in software using sse2 instructions.
>
> How is the performance of this emulation compared to our current code?
> And how is the native _Float16 performance compared to the current code?
The performance of the sse2 emulation is actually surprisingly poor, in
a quick test:
./ffmpeg -s 512x512 -f rawvideo -pix_fmt rgbaf16 -i /dev/zero -vf
format=yuv444p -f null -
_Float16 full SSE2 emulation:
frame=50074 fps=848 q=-0.0 size=N/A time=00:33:22.96 bitrate=N/A speed=33.9x
_Float16 f16c accelerated (Zen2, --cpu=znver2):
frame=50636 fps=1965 q=-0.0 Lsize=N/A time=00:33:45.40 bitrate=N/A
speed=78.6x
classic half2float full software implementation:
frame=49926 fps=1605 q=-0.0 Lsize=N/A time=00:33:17.00 bitrate=N/A
speed=64.2x
Unfortunately I don't see a good way to runtime-detect the presence of
f16c without going full self-written assembly, which would diminish the
compilers ability to take advantage of f16c only ever operating on 4 or
8 values at a time.
But the HAVE_FLOAT16 checks could be paired with a check for __F16C__,
which seems to universally be the established define for "the code is
being built f16c optimizations".
That at least avoids the case of the apparently quite slow sse2 emulation.
More information about the ffmpeg-devel
mailing list