[FFmpeg-devel] [PATCH] h264 CAVLC coeff_token decoder based on CLZ
Sun Jan 24 12:24:53 CET 2010
On Sun, Jan 24, 2010 at 03:24:52AM +0100, Michael Niedermayer wrote:
> On Sat, Jan 23, 2010 at 06:09:13PM -0800, Jason Garrett-Glaser wrote:
> > On Sat, Jan 23, 2010 at 6:01 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > > On Sat, Jan 23, 2010 at 04:15:28PM -0800, Jason Garrett-Glaser wrote:
> > >> On Sat, Jan 23, 2010 at 11:03 AM, Pascal Massimino
> > >> <pascal.massimino at gmail.com> wrote:
> > >> > On Sat, Jan 23, 2010 at 10:18 AM, Michael Niedermayer <michaelni at gmx.at>wrote:
> > >> >
> > >> >> On Sat, Jan 23, 2010 at 03:28:53AM +0300, Anatoliy Nenashev wrote:
> > >> >> > Hi all!
> > >> >> > I have made some investigations in H264 CAVLC coeff_token decoder.
> > >> >> > In attached patch you can see special implementation of VLC decoder for
> > >> >> > coeff_token which is based on CLZ (count leading zeros).
> > >> >> > This method reduce size of VLC decoding tables for coeff_token from
> > >> >> > (520+332+280+256)*2 = 2776 byte to (2*4*16 + 64 + 67 + 63 + 63) = 385
> > >> >> byte.
> > >> >>
> > >> >
> > >> > FWIW: these table are not called that often,
> > >>
> > >> ~8-24 times per MB isn't that often?
> > >
> > > at least 12 times per MB isnt often because the loop filter SSE2 code is
> > > writen in yasm and thus cant be inlined forcing us to do 12 calls per MB
> > > ;)
> > GCC's stupidity costs far more clocks than any function call. If you
> > don't like function call overhead on x86_32, patches are welcome to
> > add fastcall support to the yasm macros.
> i dont like it on x86_64 either
> > Furthermore, I suspect you'd get more benefit from inlining
> > filter_mb_edgev and friends and *NOT* inlining the deblock code than
> > doing the reverse.
> i expected this as well, and i tried it at least twice already but gcc ...
> I also tried to split the slow loop filter path in intra/inter and tried
> to unroll the loop for the first iteration to be handled seperately and
> and and.
> gcc doesnt seem to like me and my code
heres an example:
from my inlined code is turned into:
testl %r15d, %r15d
testl %r13d, %r13d
testb %r9b, %r9b
testb %r14b, %r14b
why is gcc doing this?
btw, are there videos with both alpha_offset<beta_offset
and beta_offset < alpha_offset ? because if one doesnt occur in practice we
could drop one of these checks.
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Avoid a single point of failure, be that a person or equipment.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel