[FFmpeg-devel] gcc: Remove auto-vectorization limitation.

Martin Storsjö martin at martin.st
Wed May 21 15:22:01 EEST 2025


On Wed, 21 May 2025, Andreas Rheinhardt wrote:

> Martin Storsjö:
>> On Wed, 21 May 2025, Andreas Rheinhardt wrote:
>> 
>>> Jiawei:
>>>> This patch modifies the FFmpeg build system to remove the explicit
>>>> disabling
>>>> of GCC's auto-vectorization feature.
>>>>
>>>> Modern GCC versions (>= 10.0) have demonstrated stable auto-
>>>> vectorization
>>>> capabilities through extensive optimizations in loop analysis and SIMD
>>>> code generation. The explicit -fno-tree-vectorize flag originally added
>>>> in commit 973859f (2009) to workaround early GCC vectorization
>>>> instability
>>>> is no longer necessary.
>>>>
>>>> Key improvements justifying this change:
>>>> 1. Enhanced heuristics for loop vectorization cost models
>>>> 2. Mature handling of alignment and memory access patterns
>>>> 3. Robust fallback mechanisms for unsupported architectures
>>>>
>>>> This change allows FFmpeg to benefit from automated SIMD optimizations
>>>> when built with -O3 optimization level, particularly improving
>>>> performance on x86_64 (AVX), ARM64 (SVE) and RISC-V(RVV) architectures.
>>>>
>>>> [1] https://git.ffmpeg.org/gitweb/ffmpeg.git/
>>>> commit/973859f5230e77beea7bb59dc081870689d6d191
>>>>
>>>> ---
>>>>  configure | 1 -
>>>>  1 file changed, 1 deletion(-)
>>>>
>>>> diff --git a/configure b/configure
>>>> index 3730b0524c..b9e95ce4ec 100755
>>>> --- a/configure
>>>> +++ b/configure
>>>> @@ -7656,7 +7656,6 @@ if enabled icc; then
>>>>              disable aligned_stack
>>>>      fi
>>>>  elif enabled gcc; then
>>>> -    check_optflags -fno-tree-vectorize
>>>>      check_cflags -Werror=format-security
>>>>      check_cflags -Werror=implicit-function-declaration
>>>>      check_cflags -Werror=missing-prototypes
>>>
>>> FYI: The last discussion about auto-vectorization is here:
>>> https://ffmpeg.org/pipermail/ffmpeg-devel/2022-July/299405.html
>>> It contains a report about a failing build with vectorization enabled:
>>> https://ffmpeg.org/pipermail/ffmpeg-devel/2022-July/299421.html
>>> I don't know whether this is still reproducible with the latest GCC.
>> 
>> The issue which was reported last time, when compiling for i686 mingw32
>> with --cpu=haswell, seems to have gone away in
>> 182663a58a7a099e02e76da3b0f96d63e5c26a6d, where we made the whole
>> problematic x86 inline cabac assembly noinline on i386. (That whole
>> inline assembly block has been problematic in a large number of cases
>> anyway.)
>> 
>
> So there are currently no known miscompilations due to vectorization
> with GCC?

I'm not aware of any, but I haven't tested widely. It certainly is worth 
evalulating.

(From dav1d, I can anecdotally add that autovectorization does seem to 
help, somewhat, especially when there's not 100% assembly coverage for the 
use case. For some cases it make things slower than without 
autovectorization, but generally the net result is positive.)

// Martin


More information about the ffmpeg-devel mailing list