[FFmpeg-devel] [PATCH] h264 CAVLC coeff_token decoder based on CLZ
Sun Jan 24 03:09:13 CET 2010
On Sat, Jan 23, 2010 at 6:01 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Sat, Jan 23, 2010 at 04:15:28PM -0800, Jason Garrett-Glaser wrote:
>> On Sat, Jan 23, 2010 at 11:03 AM, Pascal Massimino
>> <pascal.massimino at gmail.com> wrote:
>> > On Sat, Jan 23, 2010 at 10:18 AM, Michael Niedermayer <michaelni at gmx.at>wrote:
>> >> On Sat, Jan 23, 2010 at 03:28:53AM +0300, Anatoliy Nenashev wrote:
>> >> > Hi all!
>> >> > I have made some investigations in H264 CAVLC coeff_token decoder.
>> >> > In attached patch you can see special implementation of VLC decoder for
>> >> > coeff_token which is based on CLZ (count leading zeros).
>> >> > This method reduce size of VLC decoding tables for coeff_token from
>> >> > (520+332+280+256)*2 = 2776 byte to (2*4*16 + 64 + 67 + 63 + 63) = 385
>> >> byte.
>> > FWIW: these table are not called that often,
>> ~8-24 times per MB isn't that often?
> at least 12 times per MB isnt often because the loop filter SSE2 code is
> writen in yasm and thus cant be inlined forcing us to do 12 calls per MB
GCC's stupidity costs far more clocks than any function call. If you
don't like function call overhead on x86_32, patches are welcome to
add fastcall support to the yasm macros.
Furthermore, I suspect you'd get more benefit from inlining
filter_mb_edgev and friends and *NOT* inlining the deblock code than
doing the reverse.
More information about the ffmpeg-devel