[FFmpeg-devel] [PATCH/RFC] ALT_BITSTREAMREADER optimizations for the platforms with strict alignment

Siarhei Siamashka siarhei.siamashka
Mon Dec 8 23:17:03 CET 2008


On Monday 08 December 2008, Michael Niedermayer wrote:
> On Mon, Dec 08, 2008 at 12:12:41PM +0200, Siarhei Siamashka wrote:
[...]
> > Index: libavcodec/bitstream.h
> > ===================================================================
> > --- libavcodec/bitstream.h	(revision 16028)
> > +++ libavcodec/bitstream.h	(working copy)
> > @@ -424,9 +424,29 @@
> >  #   define SKIP_CACHE(name, gb, num)\
> >          name##_cache >>= (num);
> >  # else
> > +
> > +#if !ENABLE_FAST_UNALIGNED
> > +#   undef MIN_CACHE_BITS
> > +#   define MIN_CACHE_BITS 17
> >  #   define UPDATE_CACHE(name, gb)\
> > +    {\
> > +        const uint8_t *bufptr = ((const uint8_t
> > *)(gb)->buffer)+(name##_index>>3);\ +        int shiftcount =
> > name##_index & 7;\
> > +        uint8_t b1 = bufptr[1];\
> > +        uint8_t b0 = bufptr[2];\
> > +        uint8_t b2 = bufptr[0];\
> > +        shiftcount += 8;\
> > +        name##_cache = b0 | (b1 << 8);\
> > +        name##_cache |= (b2 << 16);\
> > +        name##_cache <<= shiftcount;\
> > +    }\
> > +
> > +#else
> > +#   define UPDATE_CACHE(name, gb)\
> >          name##_cache= AV_RB32( ((const uint8_t
> > *)(gb)->buffer)+(name##_index>>3) ) << (name##_index&0x07);\
> >
> > +#endif
> > +
> >  #   define SKIP_CACHE(name, gb, num)\
> >          name##_cache <<= (num);
> >  # endif
>
> is there any advantage in this over using one of the readers that do not do
> unaligned accesses?

Those readers still need to do endian conversion, which makes them not very 
much attractive. Endian conversion is implemented using 4 instructions per
each 32-bit value on the ARM devices without a dedicated REV instruction
(introduced in ARMv6). Just reading data byte after byte and combining them in
right order is 7 instructions (4 loads and 3 OR operations). So the difference
is not so big and other factors also play their role.

A32_BITSTREAM_READER is not very good overall, because it blows up code size
too much, wasting precious space in L1 cache. I remember it was set as default
for ARM after running some benchmarks with ffmp3 decoder. Appears that it is
quite good for MP3 decoder code because this decoder has a lot of small
constant size calls to 'get_bits' function (for which UPDATE_CACHE from
A32_BITSTREAM_READER is mostly only doing counter check and a conditional
jump over the rest of code). On the other hand, ALT_BITSTREAM_READER can also
get a special version of UPDATE_CACHE macro to be used for the cases when we
need less than 10 bits and I suspect that it can win even for ffmp3 decoder.
Otherwise it should be always possible to have bitstream reader preferences
set for each decoder individually, this only needs to do a somewhat dumb work
of performing a large scale codecs and bitstream readers benchmarking :)

> > Index: libavcodec/svq1dec.c
> > ===================================================================
> > --- libavcodec/svq1dec.c	(revision 16028)
> > +++ libavcodec/svq1dec.c	(working copy)
> > @@ -195,7 +195,7 @@
> >
> >  #define SVQ1_CALC_CODEBOOK_ENTRIES(cbook)\
> >        codebook = (const uint32_t *) cbook[level];\
> > -      bit_cache = get_bits (bitbuf, 4*stages);\
> > +      bit_cache = get_bits_long (bitbuf, 4*stages);\
> >        /* calculate codebook entries for this vector */\
> >        for (j=0; j < stages; j++) {\
> >          entries[j] = (((bit_cache >> (4*(stages - j - 1))) & 0xF) +
> > 16*j) << (level + 1);\
>
> this code (not only the patch) looks like it needs some work, and such
> work (cleanup) would likely make the 17bit problem go away as a side
> effect.

Is anybody going to do this cleanup in the near future? I just wonder whether
you are suggesting me to do this work or whatever.

BTW, the attached workaround finally makes the regression test pass. I'm not
proposing to apply it as is, it only shows where the last 17bit problem was
hidden.

-- 
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bitstream_reader_17bit_workaround.diff
Type: text/x-diff
Size: 822 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081209/45c43f5c/attachment.diff>



More information about the ffmpeg-devel mailing list