[FFmpeg-devel] [PATCH] Extra build options for ALS (and others)
Thilo Borgmann
thilo.borgmann
Fri Nov 27 22:23:20 CET 2009
M?ns Rullg?rd schrieb:
> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>
>> M?ns Rullg?rd schrieb:
>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>>>
>>>> M?ns Rullg?rd schrieb:
>>>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>>>>>
>>>>>> M?ns Rullg?rd schrieb:
>>>>>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> recently the need for an extra build option for the ALS decoder arose.
>>>>>>> Is it impossible to achieve the desired outcome with some combination
>>>>>>> of always_inline, noinline, and flatten attributes?
>>>>>> No. See [PATCH] Split reading and decoding of blocks in ALS.
>>>>>>
>>>>>> Although I've managed to have the functions from the alsdec.c inlined
>>>>>> manually according to the grep'ed output of the assembler code, it seems
>>>>>> like it is not enough to manually inline functions from within that .c
>>>>>> file only using these technique.
>>>>> I'm confused. Can it be done in the C code only or not? This kind of
>>>>> issue should really not be solved in the makefile.
>>>> The issue is the big slowdown. The patch that causes this splits a big
>>>> function into two, which are then called successively.
>>>>
>>>> To overcome the slowdown issue, I inspected the functions being inlined
>>>> with and without the -finline-limit option. I can use av_always_inline
>>>> for many functions within alsdec.c to have the same functions inlined
>>>> like -finline-limit does.
>>>>
>>>> Unfortunately, using -finline-limit removes the slowdown introduced by
>>>> the patch while using av_always_inline does not.
>>> So it's not doing the same thing. What is it doing differently?
>>> Where did you get the limit number from?
>>>
>> All function calls within alsdec.s when using -finline-limit=4096:
>> 1 call L1102
>> 1 call L138
>> 1 call L456
>> 2 call L___udivdi3$stub
>> 10 call L_av_freep$stub
>> 1 call L_av_get_bits_per_sample_format$stub
>> 12 call L_av_log$stub
>> 5 call L_av_log_missing_feature$stub
>> 8 call L_av_malloc$stub
>> 2 call L_av_mallocz$stub
>> 1 call L_ff_mpeg4audio_get_config$stub
>> 6 call L_memcpy$stub
>> 2 call L_memmove$stub
>> 1 call L_memset$stub
>> 2 call _decode_blocks_ind
>> 4 call _decode_end
>> 36 call _decode_rice
>> 10 call _get_bits_long
>> 11 call _parse_bs_info
>> 2 call _zero_remaining
>>
>> All function calls within alsdec.s when using many av_always_inline's.
>> This is designed to inline the same functions from alsdec.c like the
>> unpatched alsdec.c would yield without any extra build option:
>> 1 call L1561
>> 1 call L176
>> 1 call L21
>> 2 call L___udivdi3$stub
>> 10 call L_av_freep$stub
>> 1 call L_av_get_bits_per_sample_format$stub
>> 13 call L_av_log$stub
>> 5 call L_av_log_missing_feature$stub
>> 8 call L_av_malloc$stub
>> 2 call L_av_mallocz$stub
>> 1 call L_ff_mpeg4audio_get_config$stub
>> 1 call L_memcpy$stub
>> 1 call L_memmove$stub
>> 2 call L_memset$stub
>> 8 call ___inline_memcpy_chk
>> 2 call ___inline_memmove_chk
>> 6 call _align_get_bits
>> 5 call _av_ceil_log2
>> 4 call _av_clip
>> 4 call _decode_end
>> 47 call _get_bits
>> 90 call _get_bits1
>> 3 call _get_bits_count
>> 61 call _get_bits_left
>> 39 call _get_bits_long
>> 4 call _get_sbits_long
>> 60 call _get_unary
>> 2 call _init_get_bits
>> 3 call _parse_bs_info
>> 3 call _read_time
>> 7 call _skip_bits
>> 2 call _skip_bits1
>> 5 call _skip_bits_long
>
> Not inlining those get_bits etc will certainly slow things down,
> that's for sure.
>
>> So -finline-limit can inline many functions in the object file which are
>> not part of alsdec.c. Which might be the reason for the performance
>> difference.
>>
>> But using -finline-limit does not yield a speed gain for the unpatched
>> file! So there might be something else but I don't see.
>>
>> The value of 4096 has been choosen randomly. As long as I don't know
>> exactly why -finline-limit removes the slowdown and that it cannot be
>> replaced by another approach, there is no need to figure out a more
>> optimal value...
>
> We should do some benchmarks using that flag globally and see what
> happens. Maybe we'd gain from using it everywhere.
Like Michael said, this would be a big test for different platforms and
compilers which I cannot offer alone so several people would have to do
this - if a benchmark would indicate that it might be worth testing.
Also, I'm lacking a good idea of how to test this efficiently without
having other factors like harddrives playing a predominant role which
means testing execution time of ffmpeg.
But does a common profit from this option makes it a good one to be
globally added? If yes, could we add this specifically to als for the
time being instead of holding back als decoder development completely?
Benchmarking and testing will surely take a lot of time...
-Thilo
More information about the ffmpeg-devel
mailing list