[FFmpeg-devel] [PATCH] SSE dct32() [Was: r23095 - in trunk/libavcodec: ...]
Vitor Sessak
vitor1001
Tue Jun 8 12:56:16 CEST 2010
On 06/08/2010 01:52 AM, Michael Niedermayer wrote:
> On Sat, Jun 05, 2010 at 07:35:29AM +0200, Vitor Sessak wrote:
>> Moving discussion to -devel...
>>
>> On 05/31/2010 09:59 PM, Vitor Sessak wrote:
>>> On 05/14/2010 05:52 PM, Michael Niedermayer wrote:
>>>> On Fri, May 14, 2010 at 08:39:48AM +0200, Vitor Sessak wrote:
>>>>> Michael Niedermayer wrote:
>>>>>> On Tue, May 11, 2010 at 03:56:45PM -0400, Alex Converse wrote:
>>>>>>> On Tue, May 11, 2010 at 3:52 PM, michael<subversion at mplayerhq.hu>
>>>>>>> wrote:
>>>>>>>> Author: michael
>>>>>>>> Date: Tue May 11 21:52:42 2010
>>>>>>>> New Revision: 23095
>>>>>>>>
>>>>>>>> Log:
>>>>>>>> float based mp1/mp2/mp3 decoders.
>>>>>>>>
>>>>>>> Thanks
>>>>>> :)
>>>>>> btw, any volunteers to try to hook it up to our split radix dct and or
>>>>>> simd optimize it?
>>>>>
>>>>> Without rdft or dct simd, our split radix code is slower. Ugly hack
>>>>> to test
>>>>> it attached.
>>>>
>>>> if dct32() is faster then it should be used by our generic dct code.
>>>> at least for the plain C case
>>>
>>> I've given a try at a SSE dct32(). It is much faster than current C
>>> code. The only problem is that current code in mpegaudiodec.c expect two
>>> arguments, one input (which is destructed) and one output. ITOH,
>>> ff_dct_calc() does everything in-place, what does not glue well with the
>>> current mpegaudiodec.c code. Can you (or anyone else that knows
>>> mpegaudiodec.c well) fix it?
>>
>> I've given a try of making mpegaudiodec.c use the same buffer for dct input
>> and output and it is not trivial. It is much easier (and has no measurable
>> slowdown) to make ff_dct_calc() take both an input and an output pointer as
>> in attached patch.
>>
>> -Vitor
>
>> avfft.c | 2 +-
>> binkaudio.c | 2 +-
>> dct.c | 40 +++++++++++++++++++++++-----------------
>> fft-test.c | 6 ++----
>> fft.h | 11 +++++++++--
>> wmavoice.c | 4 ++--
>> 6 files changed, 38 insertions(+), 27 deletions(-)
>> 91cf0cde9a50a47a8df3fbd171b35535abe00d16 dct_inout.diff
>
> ok if tested and no slowdown is confirmed
I retested carefully and found a 3% slowdown. It is due to aliasing,
which does not allow the compiler to unroll the loops. I tested
unrolling by hand the loops and afterwards it is as fast as before.
Are you ok with the patch as is or ok if I apply another patch
afterwards unrolling the loops?
-Vitor
More information about the ffmpeg-devel
mailing list