[FFmpeg-devel] [PATCH] get_cabac_inline_x86: Don't inline if 32-bit clang on windows

Thu Aug 19 21:40:51 EEST 2021

On 8/18/2021 7:01 AM, Martin Storsjö wrote:
> On Tue, 17 Aug 2021, James Almer wrote:
> 
>> On 8/17/2021 12:35 PM, Christopher Degawa wrote:
>>> Fixes https://trac.ffmpeg.org/ticket/8903
>>>
>>> relevant https://github.com/msys2/MINGW-packages/discussions/9258
>>>
>>> Signed-off-by: Christopher Degawa <ccom at randomderp.com>
>>> ---
>>>   libavcodec/x86/cabac.h | 9 +++++++--
>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/libavcodec/x86/cabac.h b/libavcodec/x86/cabac.h
>>> index 53d74c541e..b046a56a6b 100644
>>> --- a/libavcodec/x86/cabac.h
>>> +++ b/libavcodec/x86/cabac.h
>>> @@ -177,8 +177,13 @@
>>>     #if HAVE_7REGS && !BROKEN_COMPILER
>>>   #define get_cabac_inline get_cabac_inline_x86
>>> -static av_always_inline int get_cabac_inline_x86(CABACContext *c,
>>> -                                                 uint8_t *const state)
>>> +static
>>> +#if defined(_WIN32) && !defined(_WIN64) && defined(__clang__)
>>
>> Can you do some benchmarks to see how not inlining this compares to 
>> simply disabling this code for this target? Because it sounds like you 
>> may want to add this case to the BROKEN_COMPILER macro, and not use 
>> this code at all.
> 
> I tried benchmarking it, and in short, this patch seems to be the best 
> solution.
> 
> I tested 3 configurations; with this patch (changing av_always_inline 
> into av_noinline), setting BROKEN_COMPILER (skipping these inline asm 
> functions) and configuring with --cpu=i686 (which means it passes 
> -march=i686 to the compiler, which disallows the use of inline MMX/SSE). 
> I benchmarked singlethreaded decoding of a high bitrate H264 clip 
> (listing the lowest measured time out of 3 runs):
> 
> av_noinline: 90.94 seconds
> BROKEN_COMPILER: 98.92 seconds
> -march=i686: 94.63 seconds
> 
> (The fact that building with -march=i686 is faster than using some but 
> not all inline MMX/SSE is a bit surprising.)
> 
> I also tested the same setup on x86_64 (on a different machine, with 
> Apple Clang), where I tested the above and compare it with the default 
> configuration using av_always_inline):
> 
> av_always_inline: 74.65 seconds
> av_noinline: 73.74 seconds
> BROKEN_COMPILER: 78.10 seconds
> 
> So av_noinline actually seems to be generally favourable here (and for 
> some reason, actually a bit faster than the always_inline case, although 
> I'm not sure if that bit is deterministic in general or not).
> 
> 
> // Martin

Alright, LGTM then.