[FFmpeg-devel] [PATCH] SSE dct32()

Måns Rullgård mans
Sun Jun 20 14:12:48 CEST 2010


Vitor Sessak <vitor1001 at gmail.com> writes:

> On 06/20/2010 01:33 PM, M?ns Rullg?rd wrote:
>> Vitor Sessak<vitor1001 at gmail.com>  writes:
>>
>>> On 06/20/2010 12:15 PM, M?ns Rullg?rd wrote:
>>>> Vitor Sessak<vitor1001 at gmail.com>   writes:
>>>>
>>>>>>> I don't remember seeing a big difference _for the dct32 code_ between in ==
>>>>>>> out and in != out.
>>>>>>
>>>>>> now iam confused, i thought the 3% you quoted was about in ==out vs in!= out
>>>>>> ?
>>>>>
>>>>> No, the 3% slowdown was when converting our general code (using FFT)
>>>>> to have in != out.
>>>>
>>>> And that was due to missed optimisations caused by gcc not knowing
>>>> that those pointers don't alias each other.  Marking them restrict is
>>>> not good either, since we actually want to pass the same value
>>>> sometimes.
>>>
>>> That and one extra used register.
>>
>> So what do we do?  I see the following options:
>>
>> 1. Change mp3 decoder to work with inplace transform.
>
> Looks hard with no speed loss

Just hard or impossible?

>> 2. Copy the block before doing inplace transform.
>
> Speed loss

Yes, of course.  I was merely listing every option, good or bad.

>> 3. Apply magic to remove slowdown from splitting in/out.
>> Did I miss anything?
>
> Yes:
>
> 4. Have a special function pointer only for the 32-point DCT accepting
> in != out as in my patch in this thread (dct32_new.diff). Note that
> for the function for 32-point DCT (and only for it) in != out does not
> give a noticeable speed loss.

I'm sure you also see the slight ugliness in this.  If it's the only
sane solution, so be it, but I'd prefer something nicer.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list