[FFmpeg-devel] [PATCH] get_cabac_inline_x86: Don't inline if 32-bit clang on windows

Wed Aug 18 13:01:21 EEST 2021

On Tue, 17 Aug 2021, James Almer wrote:

> On 8/17/2021 12:35 PM, Christopher Degawa wrote:
>> Fixes https://trac.ffmpeg.org/ticket/8903
>> 
>> relevant https://github.com/msys2/MINGW-packages/discussions/9258
>> 
>> Signed-off-by: Christopher Degawa <ccom at randomderp.com>
>> ---
>>   libavcodec/x86/cabac.h | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>> 
>> diff --git a/libavcodec/x86/cabac.h b/libavcodec/x86/cabac.h
>> index 53d74c541e..b046a56a6b 100644
>> --- a/libavcodec/x86/cabac.h
>> +++ b/libavcodec/x86/cabac.h
>> @@ -177,8 +177,13 @@
>>     #if HAVE_7REGS && !BROKEN_COMPILER
>>   #define get_cabac_inline get_cabac_inline_x86
>> -static av_always_inline int get_cabac_inline_x86(CABACContext *c,
>> -                                                 uint8_t *const state)
>> +static
>> +#if defined(_WIN32) && !defined(_WIN64) && defined(__clang__)
>
> Can you do some benchmarks to see how not inlining this compares to simply 
> disabling this code for this target? Because it sounds like you may want to 
> add this case to the BROKEN_COMPILER macro, and not use this code at all.

I tried benchmarking it, and in short, this patch seems to be the best 
solution.

I tested 3 configurations; with this patch (changing av_always_inline into 
av_noinline), setting BROKEN_COMPILER (skipping these inline asm 
functions) and configuring with --cpu=i686 (which means it passes 
-march=i686 to the compiler, which disallows the use of inline MMX/SSE). I 
benchmarked singlethreaded decoding of a high bitrate H264 clip (listing 
the lowest measured time out of 3 runs):

av_noinline: 90.94 seconds
BROKEN_COMPILER: 98.92 seconds
-march=i686: 94.63 seconds

(The fact that building with -march=i686 is faster than using some but not 
all inline MMX/SSE is a bit surprising.)

I also tested the same setup on x86_64 (on a different machine, with Apple 
Clang), where I tested the above and compare it with the default 
configuration using av_always_inline):

av_always_inline: 74.65 seconds
av_noinline: 73.74 seconds
BROKEN_COMPILER: 78.10 seconds

So av_noinline actually seems to be generally favourable here (and for 
some reason, actually a bit faster than the always_inline case, although 
I'm not sure if that bit is deterministic in general or not).

// Martin