[Ffmpeg-devel] VP3/Theora Perfection

Michael Niedermayer michaelni
Thu May 19 12:12:03 CEST 2005


Hi

On Thursday 19 May 2005 04:47, Mike Melanson wrote:
> Hi,
> 	I have replaced unpack_token() with a series of lookup tables in vp3.c.
> Now vp3data.h has more lines than vp3.c. Again, please test as I do not
> have great testing facilities right now. However, I did run a series of
> tests that validated a bunch of decoded tokens against the old function.
>
> 	Numbers for the speed freaks:
>
> [original]
> 1223 dezicycles in unpack_token, 32757 runs, 11 skips
> 1202 dezicycles in unpack_token, 65512 runs, 24 skips
> [new]
> 845 dezicycles in unpack_token, 32735 runs, 33 skips
> 841 dezicycles in unpack_token, 65466 runs, 70 skips

well, not here, after a cvs up unpack_dct_coeffs (which includes the 
unpack_token()) speed droped by 20%, to exclude possible effects of local 
changes i tried on a clean tree

[original]
47208165 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
46909636 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
47450793 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips

[new]
43178650 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
42991589 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
43081780 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips

FLAGS=-O3 -g -Wall -Wno-switch -fomit-frame-pointer -mcpu=athlon -march=athlon

which matches your claim, but it didnt make sense, not only is my dev tree 
reacting in the opposite direction but the code shouldnt be faster, as you 
replaced a single often unpredicted jump in a jump table with a few if() 
which likely wont be predicted better

another try with different cflags confirmed my suspicion, your new code seems 
slower but its smaller and gcc seems to inline it while it didnt previously

[original]
41514189 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
41710143 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
41758835 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips

[new]
43992551 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
44276594 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips
43972657 dezicycles in unpack_dct_coeffs, 64 runs, 0 skips

OPTFLAGS=-O3 -g -Wall -Wno-switch -fomit-frame-pointer -mcpu=athlon 
-march=athlon -finline-limit=2000


>
> 	What should I optimize next?

retry same function with -finline-limit=2000 :)

-- 
Michael





More information about the ffmpeg-devel mailing list