[MPlayer-dev-eng] Re: Compile options

laurent wozniak laurent.wozniak at laposte.net
Sun Oct 1 19:35:51 CEST 2006


Trent Piepho wrote:
> It seems like there are some functions which h264 inlines that cause the
> code size to grow very large, effecting the cache miss rate enough to make
> inlining slower.  Why doesn't this also make mpeg4 slower?  I can think
> of a two reasons:
> 1. Most functions get faster with inlining, but some get slower.  mpeg4
>    either doesn't use, or uses less, the functions which get slower.  Or h264
>    doesn't use, or uses less, the functions which get faster.
> 2. mpeg4 has a smaller code size elsewhere, and so it's working set still
>    fits in the cache with the larger functions.  h264's working set size is
>    larger, and so inlining has an overall negative impact.
>
> If the first reason is the cause, then attribute((noinline)) could be added
> to the functions which get slower with inlining.  Comparing the code size
> of each exported function of dsputil_mmx.c with and without
> -finline-functions will find which functions get much larger.  Then one
> must find what functions those functions are inlining that causes them to
> get larger and noinline them.
>   
Hello,

Some searchers have studied and presented us this problem when I was at 
university (8 years ago).

The problem is when you inline too many functions at once, you consume 
too many registers in the resulting function and the remaining variables 
are used from the stack.
If too many variables are used from the stack, performance of the 
resulting inlined function are worse than the non inline functions.
This is because when calling a function, you first push many registers 
on the stack, thus the called function can also use registers.
Some processors even have a special instruction to perform this push in 
one shot.
Seems that GCC is still not able to do this "push some registers" in the 
middle (or at some point) of a function when inlining.
That's why the resulting inlined function can have poor performances.

This result of course varies from one processor family to another, since 
they don't have the same number of registers.
And pcode internal registers are also probably influenced (I mean the 
internal risc side of a cisc), thus x86 is not one single processor 
family regarding this problem.

Cheers,
Laurent




More information about the MPlayer-dev-eng mailing list