[Ffmpeg-devel] benchmark of different CABAC routines

Michael Niedermayer michaelni
Wed Oct 11 21:41:57 CEST 2006


Hi

On Wed, Oct 11, 2006 at 09:07:14PM +0200, Michael Niedermayer wrote:
[...]
> > 
> > The modified non-branchless C version has
> > 
> > 	uint8_t tmp = s + 2;
> > 	if (tmp < 126)
> > 	    s = tmp;
> > 	*state = s;
> > 
> > instead of
> > 
> >         *state= ff_h264_mps_state[s];
> > 
> > Writing it that way instead of the "s += 2; if (s < 128) *state = s"
> > which was there earlier (and was slower) makes gcc use cmov instead of a
> > branch and is faster.
> 
> interresting, i will experiment with this a little ...

ive used the following patch

@@ -396,7 +396,13 @@
         "shl %%cl, %%edx                        \n\t"
         "shl %%cl, %%ebx                        \n\t"
 #endif
+#if 1
+        "leal 2(%%eax), %%ecx                   \n\t"
+        "cmpl $124, %%eax                       \n\t"
+        "cmovae %%eax, %%ecx                    \n\t"
+#else
         "movzbl "MANGLE(ff_h264_mps_state)"(%%eax), %%ecx   \n\t"
+#endif
         "movb %%cl, (%1)                        \n\t"
 //eax:state ebx:low, edx:range, esi:RangeLPS
         "test %%bx, %%bx                        \n\t"


P3:
branched asm:
4110 dezicycles in decode_residual, 2094505 runs, 2647 skipsbits/s dup=0 drop=0
4126 dezicycles in decode_residual, 2094479 runs, 2673 skipsbits/s dup=0 drop=0

branched asm + patch:
4172 dezicycles in decode_residual, 2094355 runs, 2797 skipsbits/s dup=0 drop=0
4177 dezicycles in decode_residual, 2094341 runs, 2811 skipsbits/s dup=0 drop=0

athlon:
branched asm:
4067 dezicycles in decode_residual, 2096725 runs, 427 skipskbits/s dup=0 drop=0
4088 dezicycles in decode_residual, 2096733 runs, 419 skipskbits/s dup=0 drop=0
4089 dezicycles in decode_residual, 2096753 runs, 399 skipskbits/s dup=0 drop=0

branched asm + patch:
4066 dezicycles in decode_residual, 2096708 runs, 444 skipskbits/s dup=0 drop=0
4092 dezicycles in decode_residual, 2096747 runs, 405 skipskbits/s dup=0 drop=0
4065 dezicycles in decode_residual, 2096759 runs, 393 skipskbits/s dup=0 drop=0

so as far as i can see theres no speed gain from this

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is




More information about the ffmpeg-devel mailing list