[MPlayer-dev-eng] Re: Re: PATCH [0/12] CoreAVCDecoder support

Sun Feb 18 05:13:14 CET 2007

On Sun, Feb 18, 2007 at 01:42:02AM +0100, Michael Niedermayer wrote:
> > > well i guess that the choice of compiler does have some effect on the speed
> > > gcc is not a particulary good choice ... and theres alot more c code in h264
> > > decoding then in mpeg4 asp, rewriting it all in asm by hand does of course
> > > work but it doesnt seem like the right solution
> > 
> > imo relying on the compiler to optimize the performance-critical parts
> > of your code isn't the right solution. 
> 
> this depends on the size of these parts, look in h264.c, rewrite half of
> it in asm for every cpu supported ...

that would be how many... 2? 3?
while i agree it's always wrong to write asm only and not portable
code, i also think it's a bit silly to think of asm as an O(n) task
where n is a large number of targets. basically everything but x86 is
dead for the time being. maybe once free systems achieve world
domination and windows and mac binaryware are irrelevant, then we can
think about alternative (superior) archs again, but for now everything
else is such a joke in performance-per-cost and even in
performance-per-watt (casual estimates a friend and i made last night
suggest that an athlon underclocked and undervolted to 500mhz or less
would use comparable wattage to 'embedded' cpus like arm that have
1/10 the performance).

> this wont happen and it shouldnt, its the compilers job, this is not
> 2 pages of code its more like 200 pages of c code ...

bleh.. :(
is there any sane way to isolate the parts that are actually the most
performance-intensive and only write them in asm? or is h264 just THAT
idiotic that it has 200 pages of performance-intensive code do to
massive overcomplexity?

> see above, h264 is messy there is alot of code outside MC/DCT/CABAC which
> is executed per block or macroblock the whole mb loop almost certainly
> doesnt fit in the L1 code cache which is likely one serious bottleneck
> especially for the crap tracecache P4 which is limited to 1 instruction
> per cpu cycle if the stuff isnt in the code cache ...
> the luma MC code alone at -O3 was something like 70kb thats also why its
> faster at -O2 or with disabled inlining as it is now in svn
> 
> maybe someone should try benchmarking h264.c compiled with -Os ?

hmm sounds like a decent idea..

> > rereading your message, i get the idea that maybe you're just saying
> > the difference between coreavc and lavc decoder could be explained
> > away by differences in the compiler, and not that people "should" be
> > using intel cc. if so, sorry for getting ot.
> 
> yes, iam trying to understand why coreavc is faster iam not advocating
> the use of a propriatary compiler or codec though i dont think its just
> the compiler ...

:)

rich