[Ffmpeg-devel] Re: fastmemcpy in ffmpeg

Gunnar von Boehn gunnar
Tue Sep 26 12:30:10 CEST 2006


Hi,

Rich Felker wrote:

>>>Just by
>>>adding one prefetch instruction to the normal Linux memcpy you can speed
>>>it up a lot 50%.
>>
>>[..]
>>
>>So why don't you submit such work for inclusion in glibc? That way,
>>everybody profits!
> 
> 
> No, multimedia apps profit and everyone else loses. fastmemcpy is
> several times slower for tiny copies, which are the only thing that
> _normal_ apps ever do. The only type of memcpy that belongs in libc is
> the ultra-trivial implementation which (on the i386 family) happens to
> also be the fastest implementation that works on all cpu generations.
> Anything like fastmemcpy requires either cpu-specific libc or runtime
> cpudetect, the former of which is probably not acceptable for most
> users and the latter of which will be horribly slow for the common
> cases...

I have to disagree, politely.

- A CPU optimized version will easely be faster
   than the normal version for sizes higher than 64/128 byte.

- An optimized version will be about twice as fast
   for sizes higher than 500 byte / 1KB.

- The added overhead for all memcpy is just one " if( size>128 ){ "
   If you tune this branch that it defaults (falls through)
   to the smaller size routine then you can get this "if"
   for 1 clock or less on many CPUs. The overhead for this is totally 
neglectable.


Please mind that the ultra trivial implementation is only
the fastest implementation for CPUs without any 2nd level cache.
Its real slow for CPUs with 2nd level cache.


If you want to see examples for a very effeciant handling of such cases 
and how to install optimized routines on runtime then please have a look 
at the source of MAC OS X.

I think we should not go into this here as its getting off-topic.


Cheers
Gunnar




More information about the ffmpeg-devel mailing list