[FFmpeg-devel] [PATCH] Speed up dct32() in mpegaudiodec and make it avoid trashing its input

Vitor Sessak vitor1001
Mon Jun 7 13:44:01 CEST 2010


On 06/07/2010 12:56 PM, Michael Niedermayer wrote:
> On Mon, Jun 07, 2010 at 08:33:36AM +0200, Vitor Sessak wrote:
>> On 06/07/2010 02:31 AM, Michael Niedermayer wrote:
>>> On Sun, Jun 06, 2010 at 04:16:27PM +0200, Vitor Sessak wrote:
>>>> $subj. This should make the function suitable to be moved to the common
>>>> DCT
>>>> framework after my patch in the thread "[PATCH] SSE dct32()".
>>>>
>>>> Benchmarks:
>>>>
>>>> Fixed point, patched:
>>>> 4554 dezicycles in dct32, 128 runs, 0 skips
>>>> 4880 dezicycles in dct32, 256 runs, 0 skips
>>>> 5078 dezicycles in dct32, 512 runs, 0 skips
>>>> 4443 dezicycles in dct32, 1024 runs, 0 skips
>>>> 4112 dezicycles in dct32, 2048 runs, 0 skips
>>>> 4122 dezicycles in dct32, 4095 runs, 1 skips
>>>> 4054 dezicycles in dct32, 8190 runs, 2 skips
>>>> 4008 dezicycles in dct32, 16379 runs, 5 skips
>>>> 3968 dezicycles in dct32, 32759 runs, 9 skips
>>>> 3911 dezicycles in dct32, 65516 runs, 20 skips
>>>> 3868 dezicycles in dct32, 131042 runs, 30 skips
>>>> 3844 dezicycles in dct32, 262075 runs, 69 skipss
>>>> 3860 dezicycles in dct32, 524151 runs, 137 skipss
>>>> 3881 dezicycles in dct32, 1048328 runs, 248 skips
>>>> 3852 dezicycles in dct32, 2096579 runs, 573 skips
>>>> 3838 dezicycles in dct32, 4193100 runs, 1204 skips
>>>> 3831 dezicycles in dct32, 8386205 runs, 2403 skips
>>>
>>> seeing the whole output is not interrestingm seeing the last score
>>> of several runs is interresting
>>
>> ok.
>>
>> Fixed point, patched:
>> 3847 dezicycles in dct32, 8386234 runs, 2374 skips
>> 3822 dezicycles in dct32, 8386575 runs, 2033 skips
>> 3846 dezicycles in dct32, 8386386 runs, 2222 skips
>>
>> Floating point, patched:
>> 3384 dezicycles in dct32_float, 8386658 runs, 1950 skips
>> 3494 dezicycles in dct32_float, 8386603 runs, 2005 skips
>> 3451 dezicycles in dct32_float, 8386525 runs, 2083 skips
>>
>> Fixed point, original:
>> 4488 dezicycles in dct32, 8385764 runs, 2844 skips
>> 4473 dezicycles in dct32, 8386027 runs, 2581 skips
>> 4485 dezicycles in dct32, 8386185 runs, 2423 skips
>>
>> Floating point, original:
>> 3781 dezicycles in dct32_float, 8386360 runs, 2248 skips
>> 3766 dezicycles in dct32_float, 8386079 runs, 2529 skips
>> 3798 dezicycles in dct32_float, 8385870 runs, 2738 skips
>>
>>>> -#define ADD(a, b) tab[a] += tab[b]
>>>> +#define ADD(a, b) val##a += val##b
>>>>
>>>> +
>>>> +#define SWAPSUM(a,b,c)\
>>>> +{\
>>>> +    FFSWAP(INTFLOAT, val##a, val##b);\
>>>> +    ADD(a, c);                     \
>>>> +}
>>>
>>> swaping variables is always a redundant operation in code lacking
>>> backward branches.
>>
>> It's true, but I was expecting the compiler to optimize it out. The code
>> was done this way to match the code in my SSE version, in which the same
>> macro did FFSWAP(float, out[a], out[b]);. But it is better not to trust the
>> compiler and a new version is attached.
>>
>> -Vitor
>
>>   mpegaudiodec.c |  129 ++++++++++++++++++++++++++++++---------------------------
>>   1 file changed, 70 insertions(+), 59 deletions(-)
>> e20a907b0ee1bd2611bde7717e8c7807fd7fd42c  mp3_dct32_2.diff
>
> ok

Applied.

-Vitor



More information about the ffmpeg-devel mailing list