[Mplayer-cvslog] CVS: main/DOCS/tech TODO,1.9,1.10

Michael Niedermayer michaelni at gmx.at
Thu Dec 13 19:28:35 CET 2001


Hi

On Thursday 13 December 2001 18:05, Nick Kurshev wrote:
[...]
> > the only possible way to really figure out which is faster is to code
> > both and benchmark them on different cpus, but i doubt that it is worth
> > it because 1. it only affects runtime cpu detection
> > 2. a single memory access (misses L1&L2 cache) need 50 cpu cycles or so
> > so even if your variant turns out to be faster on some cpu the difference
> > would be tiny
>
> I didn't understand you.
> It seems that you are catching 10-20 micro OPS per 5-7 OPS.
> But - do you know that ordinal memcpy process takes 100 000 - 500 000 cpu
> clocks? (I don't mean memcpy of 64 byte block)
yes, if u copy a large amount of data ... so optimizations on the calling 
make even less sense than

>
> My issues:
> 1) cache pollution due function inlining (significand for any cpus)
yes if they are inlined  but the compiler shouldnt do that and the code is 
full of memcpy() calls which it isnt afaik

>    else you'll get dramatically performance losing. (As you write - 50 cpu
> cycles) 
well dramatic is perhaps a bit too dramatic ;) but it would be suboptimal 
indeed, 50 is not realistic ... noone calls memcpy to copy 1 byte 
and they shouldnt be inlined as i said allready

> 2) wrong prediction in 75% of cases
>    due uncached code and data
ehh 75%? even a random number generator would do better ;)

> 3) just not elegant solution
rewrite it if u like, if urs is not slower and looks more elegant iam happy 

[...]

Michael



More information about the MPlayer-cvslog mailing list