[FFmpeg-devel] [PATCH] get_cabac_inline_x86: Don't inline if 32-bit clang on windows
James Almer
jamrial at gmail.com
Thu Aug 19 21:40:51 EEST 2021
On 8/18/2021 7:01 AM, Martin Storsjö wrote:
> On Tue, 17 Aug 2021, James Almer wrote:
>
>> On 8/17/2021 12:35 PM, Christopher Degawa wrote:
>>> Fixes https://trac.ffmpeg.org/ticket/8903
>>>
>>> relevant https://github.com/msys2/MINGW-packages/discussions/9258
>>>
>>> Signed-off-by: Christopher Degawa <ccom at randomderp.com>
>>> ---
>>> libavcodec/x86/cabac.h | 9 +++++++--
>>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/libavcodec/x86/cabac.h b/libavcodec/x86/cabac.h
>>> index 53d74c541e..b046a56a6b 100644
>>> --- a/libavcodec/x86/cabac.h
>>> +++ b/libavcodec/x86/cabac.h
>>> @@ -177,8 +177,13 @@
>>> #if HAVE_7REGS && !BROKEN_COMPILER
>>> #define get_cabac_inline get_cabac_inline_x86
>>> -static av_always_inline int get_cabac_inline_x86(CABACContext *c,
>>> - uint8_t *const state)
>>> +static
>>> +#if defined(_WIN32) && !defined(_WIN64) && defined(__clang__)
>>
>> Can you do some benchmarks to see how not inlining this compares to
>> simply disabling this code for this target? Because it sounds like you
>> may want to add this case to the BROKEN_COMPILER macro, and not use
>> this code at all.
>
> I tried benchmarking it, and in short, this patch seems to be the best
> solution.
>
> I tested 3 configurations; with this patch (changing av_always_inline
> into av_noinline), setting BROKEN_COMPILER (skipping these inline asm
> functions) and configuring with --cpu=i686 (which means it passes
> -march=i686 to the compiler, which disallows the use of inline MMX/SSE).
> I benchmarked singlethreaded decoding of a high bitrate H264 clip
> (listing the lowest measured time out of 3 runs):
>
> av_noinline: 90.94 seconds
> BROKEN_COMPILER: 98.92 seconds
> -march=i686: 94.63 seconds
>
> (The fact that building with -march=i686 is faster than using some but
> not all inline MMX/SSE is a bit surprising.)
>
> I also tested the same setup on x86_64 (on a different machine, with
> Apple Clang), where I tested the above and compare it with the default
> configuration using av_always_inline):
>
> av_always_inline: 74.65 seconds
> av_noinline: 73.74 seconds
> BROKEN_COMPILER: 78.10 seconds
>
> So av_noinline actually seems to be generally favourable here (and for
> some reason, actually a bit faster than the always_inline case, although
> I'm not sure if that bit is deterministic in general or not).
>
>
> // Martin
Alright, LGTM then.
More information about the ffmpeg-devel
mailing list