[FFmpeg-cvslog] r16207 - trunk/libavcodec/h264.c
Måns Rullgård
mans
Thu Dec 18 04:57:54 CET 2008
Michael Niedermayer <michaelni at gmx.at> writes:
> On Thu, Dec 18, 2008 at 02:57:17AM +0000, M?ns Rullg?rd wrote:
>> michael <subversion at mplayerhq.hu> writes:
>>
>> > Author: michael
>> > Date: Thu Dec 18 03:53:18 2008
>> > New Revision: 16207
>> >
>> > Log:
>> > Use the new idct functions (except chroma as it was slower in benchmarks)
>> > cathedral +0.5% speed
>> > aladin +0.6% speed [note aladin has been cat-ed 10 times to reduce the influence
>> > of init time]
>> > Speedup also verified via START/STOP_TIMER (difference was very significant
>> > for the changed parts)
>>
>> How much does this hurt on architectures that don't yet have the new
>> SIMD functions?
>
> there are no really new SIMD functions.
> I just moved the loops like
> for(i=0; i<16; i++)
> dsp->idct4x4_add(blah blah);
>
> into dsputil so they are
>
> for(i=0; i<16; i++)
> idct4x4_add_simdwhatever(blah blah);
>
> that way gcc can inline the function and avoids up to 15 calls through dsp->
>
> adding support for this to your favorite architecture is a matter of copy
> & paste and adjusting the function names.
I can see how it can be done. I'm asking how much of an impact this
has on performance until it's been done. What percentage of the old
calls are affected?
> Of course one could write the loop in asm, and iam sure it would be faster
> but i didnt do this ...
> Also this is all a little new and i cannot yet gurantee that the API is
> stable, though i do not have any plans to change it i might stumble
> across further possible improvments ...
In the future, if you do something similar that causes a performance
regression on some architectures, I'd appreciate an advance warning,
preferably including a patch. That would let me update my code and
avoid a (temporary) performance drop.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-cvslog
mailing list