[Ffmpeg-devel] benchmark of different CABAC routines

Michael Niedermayer michaelni
Tue Oct 10 13:28:24 CEST 2006


Hi

On Tue, Oct 10, 2006 at 11:59:14AM +0200, Guillaume Poirier wrote:
> Hi,
> 
> With svn-r6623
> 
> On P-M (2nd generation)
> vanilla:
> 3899 dezicycles in decode_residual, 2095922 runs, 1230 skipsbits/s
> dup=0 drop=0,
> 
> with asm routine of renorm_cabac_decoder_once which has cmov in it
> 3897 dezicycles in decode_residual, 2095978 runs, 1174 skipsbits/s
> dup=0 drop=0
> 
> 
> with CMOV_IS_FAST
> 3825 dezicycles in decode_residual, 2096057 runs, 1095 skipsbits/s
> dup=0 drop=0
> 
> with CMOV_IS_FAST + asm routine of renorm_cabac_decoder_once which has
> cmov in it
> 3807 dezicycles in decode_residual, 2096022 runs, 1130 skipsbits/s
> dup=0 drop=0
> 
> So on P-M (which is a P3 variant), the winner is CMOV_IS_FAST + asm
> routine of renorm_cabac_decoder_once which has cmov in it (2.36% faster)

benchmark for 800mhz duron gcc 4.0.3

asm -march=athlon -mcpu=athlon -mtune=athlon
4223 dezicycles in decode_residual, 2095208 runs, 1944 skipsbits/s dup=0 drop=0

BRANCHLESS asm -march=athlon -mcpu=athlon -mtune=athlon
4261 dezicycles in decode_residual, 2095190 runs, 1962 skipsbits/s dup=0 drop=0

BRANCHLESS asm 
4545 dezicycles in decode_residual, 2095109 runs, 2043 skipsbits/s dup=0 drop=0
4548 dezicycles in decode_residual, 2095132 runs, 2020 skipsbits/s dup=0 drop=0
4548 dezicycles in decode_residual, 2095233 runs, 1919 skipsbits/s dup=0 drop=0

BRANCHLESS asm CMOV_IS_FAST
4561 dezicycles in decode_residual, 2095132 runs, 2020 skipsbits/s dup=0 drop=0
4549 dezicycles in decode_residual, 2095219 runs, 1933 skipsbits/s dup=0 drop=0
4580 dezicycles in decode_residual, 2095210 runs, 1942 skipsbits/s dup=0 drop=0
4542 dezicycles in decode_residual, 2095264 runs, 1888 skipsbits/s dup=0 drop=0

C   -march=athlon -mcpu=athlon -mtune=athlon
4596 dezicycles in decode_residual, 2095114 runs, 2038 skipsbits/s dup=0 drop=0

asm
4616 dezicycles in decode_residual, 2094993 runs, 2159 skipsbits/s dup=0 drop=0

C
4849 dezicycles in decode_residual, 2094864 runs, 2288 skipsbits/s dup=0 drop=0

BRANCHLESS C -march=athlon -mcpu=athlon -mtune=athlon
4908 dezicycles in decode_residual, 2094945 runs, 2207 skipsbits/s dup=0 drop=0

BRANCHLESS C
5558 dezicycles in decode_residual, 2094664 runs, 2488 skipsbits/s dup=0 drop=0

ill also run benchmarks on an athlon later if noone is quicker, but ive no P4


offtopic ... but can someone fix ffmpeg/configure so it sets march/mtune/mcpu ?

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is




More information about the ffmpeg-devel mailing list