[FFmpeg-devel] [PATCH] SSE dct32() [Was: r23095 - in trunk/libavcodec: ...]

Michael Niedermayer michaelni
Tue Jun 8 16:04:43 CEST 2010


On Tue, Jun 08, 2010 at 12:56:16PM +0200, Vitor Sessak wrote:
> On 06/08/2010 01:52 AM, Michael Niedermayer wrote:
>> On Sat, Jun 05, 2010 at 07:35:29AM +0200, Vitor Sessak wrote:
>>> Moving discussion to -devel...
>>>
>>> On 05/31/2010 09:59 PM, Vitor Sessak wrote:
>>>> On 05/14/2010 05:52 PM, Michael Niedermayer wrote:
>>>>> On Fri, May 14, 2010 at 08:39:48AM +0200, Vitor Sessak wrote:
>>>>>> Michael Niedermayer wrote:
>>>>>>> On Tue, May 11, 2010 at 03:56:45PM -0400, Alex Converse wrote:
>>>>>>>> On Tue, May 11, 2010 at 3:52 PM, michael<subversion at mplayerhq.hu>
>>>>>>>> wrote:
>>>>>>>>> Author: michael
>>>>>>>>> Date: Tue May 11 21:52:42 2010
>>>>>>>>> New Revision: 23095
>>>>>>>>>
>>>>>>>>> Log:
>>>>>>>>> float based mp1/mp2/mp3 decoders.
>>>>>>>>>
>>>>>>>> Thanks
>>>>>>> :)
>>>>>>> btw, any volunteers to try to hook it up to our split radix dct and 
>>>>>>> or
>>>>>>> simd optimize it?
>>>>>>
>>>>>> Without rdft or dct simd, our split radix code is slower. Ugly hack
>>>>>> to test
>>>>>> it attached.
>>>>>
>>>>> if dct32() is faster then it should be used by our generic dct code.
>>>>> at least for the plain C case
>>>>
>>>> I've given a try at a SSE dct32(). It is much faster than current C
>>>> code. The only problem is that current code in mpegaudiodec.c expect two
>>>> arguments, one input (which is destructed) and one output. ITOH,
>>>> ff_dct_calc() does everything in-place, what does not glue well with the
>>>> current mpegaudiodec.c code. Can you (or anyone else that knows
>>>> mpegaudiodec.c well) fix it?
>>>
>>> I've given a try of making mpegaudiodec.c use the same buffer for dct 
>>> input
>>> and output and it is not trivial. It is much easier (and has no 
>>> measurable
>>> slowdown) to make ff_dct_calc() take both an input and an output pointer 
>>> as
>>> in attached patch.
>>>
>>> -Vitor
>>
>>>   avfft.c     |    2 +-
>>>   binkaudio.c |    2 +-
>>>   dct.c       |   40 +++++++++++++++++++++++-----------------
>>>   fft-test.c  |    6 ++----
>>>   fft.h       |   11 +++++++++--
>>>   wmavoice.c  |    4 ++--
>>>   6 files changed, 38 insertions(+), 27 deletions(-)
>>> 91cf0cde9a50a47a8df3fbd171b35535abe00d16  dct_inout.diff
>>
>> ok if tested and no slowdown is confirmed
>
> I retested carefully and found a 3% slowdown. It is due to aliasing, which 
> does not allow the compiler to unroll the loops. I tested unrolling by hand 
> the loops and afterwards it is as fast as before.
>
> Are you ok with the patch as is or ok if I apply another patch afterwards 
> unrolling the loops?

i think that a 3% speedloss is significant so iam definitly not ok with
something that leads to such speedloss.

also if yu test this patch + unroll against svn, i wonder how
svn+unroll performs
as well as what code cache effects the unroll actually has in actual use

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Rewriting code that is poorly written but fully understood is good.
Rewriting code that one doesnt understand is a sign that one is less smart
then the original author, trying to rewrite it will not make it better.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100608/52fda18f/attachment.pgp>



More information about the ffmpeg-devel mailing list