[FFmpeg-devel] [PATCH] SSE dct32()
Vitor Sessak
vitor1001
Sun Jun 20 14:59:04 CEST 2010
On 06/20/2010 02:12 PM, M?ns Rullg?rd wrote:
> Vitor Sessak<vitor1001 at gmail.com> writes:
>
>> On 06/20/2010 01:33 PM, M?ns Rullg?rd wrote:
>>> Vitor Sessak<vitor1001 at gmail.com> writes:
>>>
>>>> On 06/20/2010 12:15 PM, M?ns Rullg?rd wrote:
>>>>> Vitor Sessak<vitor1001 at gmail.com> writes:
>>>>>
>>>>>>>> I don't remember seeing a big difference _for the dct32 code_ between in ==
>>>>>>>> out and in != out.
>>>>>>>
>>>>>>> now iam confused, i thought the 3% you quoted was about in ==out vs in!= out
>>>>>>> ?
>>>>>>
>>>>>> No, the 3% slowdown was when converting our general code (using FFT)
>>>>>> to have in != out.
>>>>>
>>>>> And that was due to missed optimisations caused by gcc not knowing
>>>>> that those pointers don't alias each other. Marking them restrict is
>>>>> not good either, since we actually want to pass the same value
>>>>> sometimes.
>>>>
>>>> That and one extra used register.
>>>
>>> So what do we do? I see the following options:
>>>
>>> 1. Change mp3 decoder to work with inplace transform.
>>
>> Looks hard with no speed loss
>
> Just hard or impossible?
>
>>> 2. Copy the block before doing inplace transform.
>>
>> Speed loss
>
> Yes, of course. I was merely listing every option, good or bad.
>
>>> 3. Apply magic to remove slowdown from splitting in/out.
>>> Did I miss anything?
>>
>> Yes:
>>
>> 4. Have a special function pointer only for the 32-point DCT accepting
>> in != out as in my patch in this thread (dct32_new.diff). Note that
>> for the function for 32-point DCT (and only for it) in != out does not
>> give a noticeable speed loss.
>
> I'm sure you also see the slight ugliness in this. If it's the only
> sane solution, so be it, but I'd prefer something nicer.
Another solution would be to make everything accepting in != out and use
inlines to avoid the slowdown:
static av_always_inline void ff_dct_calc_I_inline_c(DCTContext *ctx,
FFTSample *in, FFTSample *out)
{
[... actual code ...]
}
static void ff_dct_calc_I_c(DCTContext *ctx, FFTSample *in, FFTSample *out)
{
if (in == out)
ff_dct_calc_I_inline_c(ctx, out, out);
else
ff_dct_calc_I_inline_c(ctx, in, out);
}
But it introduces its own ugliness...
-Vitor
More information about the ffmpeg-devel
mailing list