[FFmpeg-devel] [PATCH] Speed up dct32() in mpegaudiodec and make it avoid trashing its input

Michael Niedermayer michaelni
Mon Jun 7 12:56:53 CEST 2010


On Mon, Jun 07, 2010 at 08:33:36AM +0200, Vitor Sessak wrote:
> On 06/07/2010 02:31 AM, Michael Niedermayer wrote:
>> On Sun, Jun 06, 2010 at 04:16:27PM +0200, Vitor Sessak wrote:
>>> $subj. This should make the function suitable to be moved to the common 
>>> DCT
>>> framework after my patch in the thread "[PATCH] SSE dct32()".
>>>
>>> Benchmarks:
>>>
>>> Fixed point, patched:
>>> 4554 dezicycles in dct32, 128 runs, 0 skips
>>> 4880 dezicycles in dct32, 256 runs, 0 skips
>>> 5078 dezicycles in dct32, 512 runs, 0 skips
>>> 4443 dezicycles in dct32, 1024 runs, 0 skips
>>> 4112 dezicycles in dct32, 2048 runs, 0 skips
>>> 4122 dezicycles in dct32, 4095 runs, 1 skips
>>> 4054 dezicycles in dct32, 8190 runs, 2 skips
>>> 4008 dezicycles in dct32, 16379 runs, 5 skips
>>> 3968 dezicycles in dct32, 32759 runs, 9 skips
>>> 3911 dezicycles in dct32, 65516 runs, 20 skips
>>> 3868 dezicycles in dct32, 131042 runs, 30 skips
>>> 3844 dezicycles in dct32, 262075 runs, 69 skipss
>>> 3860 dezicycles in dct32, 524151 runs, 137 skipss
>>> 3881 dezicycles in dct32, 1048328 runs, 248 skips
>>> 3852 dezicycles in dct32, 2096579 runs, 573 skips
>>> 3838 dezicycles in dct32, 4193100 runs, 1204 skips
>>> 3831 dezicycles in dct32, 8386205 runs, 2403 skips
>>
>> seeing the whole output is not interrestingm seeing the last score
>> of several runs is interresting
>
> ok.
>
> Fixed point, patched:
> 3847 dezicycles in dct32, 8386234 runs, 2374 skips
> 3822 dezicycles in dct32, 8386575 runs, 2033 skips
> 3846 dezicycles in dct32, 8386386 runs, 2222 skips
>
> Floating point, patched:
> 3384 dezicycles in dct32_float, 8386658 runs, 1950 skips
> 3494 dezicycles in dct32_float, 8386603 runs, 2005 skips
> 3451 dezicycles in dct32_float, 8386525 runs, 2083 skips
>
> Fixed point, original:
> 4488 dezicycles in dct32, 8385764 runs, 2844 skips
> 4473 dezicycles in dct32, 8386027 runs, 2581 skips
> 4485 dezicycles in dct32, 8386185 runs, 2423 skips
>
> Floating point, original:
> 3781 dezicycles in dct32_float, 8386360 runs, 2248 skips
> 3766 dezicycles in dct32_float, 8386079 runs, 2529 skips
> 3798 dezicycles in dct32_float, 8385870 runs, 2738 skips
>
>>> -#define ADD(a, b) tab[a] += tab[b]
>>> +#define ADD(a, b) val##a += val##b
>>>
>>> +
>>> +#define SWAPSUM(a,b,c)\
>>> +{\
>>> +    FFSWAP(INTFLOAT, val##a, val##b);\
>>> +    ADD(a, c);                     \
>>> +}
>>
>> swaping variables is always a redundant operation in code lacking
>> backward branches.
>
> It's true, but I was expecting the compiler to optimize it out. The code 
> was done this way to match the code in my SSE version, in which the same 
> macro did FFSWAP(float, out[a], out[b]);. But it is better not to trust the 
> compiler and a new version is attached.
>
> -Vitor

>  mpegaudiodec.c |  129 ++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 70 insertions(+), 59 deletions(-)
> e20a907b0ee1bd2611bde7717e8c7807fd7fd42c  mp3_dct32_2.diff

ok

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100607/18e5a747/attachment.pgp>



More information about the ffmpeg-devel mailing list