[Ffmpeg-devel] [PATCH] H264 cabac vlc reading code

Mon Oct 16 01:19:42 CEST 2006

Hi

On Mon, Oct 16, 2006 at 01:33:31AM +0300, Uoti Urpala wrote:
> On Mon, 2006-10-16 at 00:14 +0300, Uoti Urpala wrote:
> > This sped up timing for decode_cabac_residual from about 4650 dezicycles
> > to 4550. C version with asm disabled sped up more significantly, to
> > 4710. This asm also compiles without -fomit-frame-pointer; disabling asm
> > in the general get_cabac version doesn't seem to hurt performance much
> > (in fact it gave better overall performance in my test) and allows
> > compiling the whole file without -fomit-frame-pointer.
> > 
> > I'll post a patch for that later once I clean up the code a bit and
> > include the latest committed cabac.h change.
> 
> Here's the patch. At the moment it never uses asm code outside
> decode_cabac_residual at all; that could be changed.
> 
> With the latest cabac commit included the patch did not increase cabac
> speed noticeably in my test; however I think this is just a random bad
> case from gcc since the commit increased the speed without this patch
> but not with it, making them equal (and I think there's no reason why
> the commit would not on average speed things up with this patch too).
> 
> What's more remarkable is that in my test with the latest version GCC
> generated equally fast CABAC code from pure C with this patch.

that part is strange, have you tried to let gcc put range and low into
registers? maybe gcc reuses the low & range variables in registers in the
c code and that of course doesnt work with the current asm ...

> 
> I tried testing overall performance but got a silly result: with the
> patch C and asm gave equally fast START/STOP_TIMER timing for
> decode_cabac_residual. However the C version had 2.5% better overall
> performance even though that function which was equally fast when
> separately benchmarked is the only one which changes depending on asm
> being enabled or not...

[...]
> +        "movl %[range], %%ebx                   \n\t"
[...]
> +        :"=&a"(bit), [range]"+m"(c->range), [low]"+m"(c->low), [bytestream]"+m"(c->bytestream) //FIXME this is fragile gcc either runs out of registers or misscompiles it (for example if "+a"(bit) or "+m"(*state) is used
> +        :[state]"r"(state)
> +        : "%ecx", "%ebx", "%edx", "%esi"

this will break on gcc 2.95 and its unneeded, use the %1234 style syntax
if you disslike the numbers due to readability, use a macro
%[range] vs. %"RANGE" both are equally readable

[...]
> -        if( get_cabac( &h->cabac, &h->cabac_state[85 + get_cabac_cbf_ctx( h, cat, n ) ] ) == 0 ) {
> +        if (get_cabac_special(&cabac, &h->cabac_state[85 + get_cabac_cbf_ctx( h, cat, n ) ] ) == 0 ) {

this changes a static to a always_inline static function, which is a change
which should be benchmarked independantly of the rest

>              if( cat == 1 || cat == 2 )
>                  h->non_zero_count_cache[scan8[n]] = 0;
>              else if( cat == 4 )
>                  h->non_zero_count_cache[scan8[16+n]] = 0;
>  
> +	    h->cabac.low = cabac.low;

tabs are forbidden in ffmpeg svn, and besides mixing tabs and spaces for
indention is a bad idea independant of the space vs. ta question

to the change itself, id like to understand why its not faster anymore and why
the c code ends up equally fast as the asm code before this gets commited, ill
look at the generated code if noone else is faster ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is