[MPlayer-dev-eng] Using -O4 vs. -Os

Zoltan Hidvegi mplayer at hzoli.2y.net
Wed Oct 15 10:55:20 CEST 2003


> On Tue, Oct 14, 2003 at 05:14:05PM -0500, Zoltan Hidvegi wrote:
> 
> > For the discussion about using -O4 vs. -Os, I've run sume benchmarks,
> > on my Athlon XP Thoroughbred 2233 MHz, 194MHz fsb machine, using
> > gcc-3.3.2 prerelease (debian unstable 3.3.2-0pre5).  Compile options
> > for the -Os compile were -Os -march=athlon-4 -mcpu=athlon-4 -pipe
> > -ffast-math -fomit-frame-pointer, and the same with -O4 instead of -Os
> > for the -O4 tests.  Most of the time there is not much difference
> > between -O4 and -Os, -O4 is usually faster, but sometimes -Os is
> > slightly faster (e.g. for the gaussian scale of denoise3d filters).
> > However, for hqdn3d, -Os is 5x slower, which is very strange.
> 
> First of all: there ain't no thing as -O4, -O3 is the highest 

I know that, but mplayer uses -O4 by default, and I was comparing to
the default compile flags.

> optimisation level. I hope you ran the tests more than just once to
> eliminate fluctuation; if so you should also supply the number of 

Of course, I did, and the fluctuation was 0.1% or less after the first
run.  This is not a research paper, and I have no time to write a
compehensive report.  I run with HZ=1000, that should make the CPU
time measurements more accurate.

> -O3 runs a set of more complicated optimisation which can pay off
> sometimes but typicailly bloats the code. You should also use proper
> alignment but IIRC this is automatically implied by -mcpu=<cpu>.

Yes, alignment is automatic with -O2 and -O3, but -Os probably
disables some alignment to save space.

> -Os optimises for size which means that it's a good cache saver;
> good locality especially in caches can dramatically boost performance
> and make software fly. 

-Os only optimizes for code size, and I do not think that mplayer is
I-cache limited.  Most of the time is spent in small tight loops.
Mplayer can be data cache limited, but -Os does not affect that.
-funroll-loops may increase the cache usage, but that is not enabled
by -O3.  On RISC and Itanium -funroll-loops usually makes the code
faster, but on x86 it does not help much.

Actually, gcc-3.3.2 seems to miscompile denoise3d with -Os.

Zoli



More information about the MPlayer-dev-eng mailing list