[FFmpeg-devel] [PATCH/RFC] ALT_BITSTREAMREADER optimizations for the platforms with strict alignment
Michael Niedermayer
michaelni
Mon Dec 8 11:52:28 CET 2008
On Mon, Dec 08, 2008 at 12:12:41PM +0200, Siarhei Siamashka wrote:
> Hello,
>
> The attached patch improves performance for ALT_BITSTREAMREADER on the
> platforms with strictly aligned or slow unaligned memory accesses. Current
> implementation results in the generation of 4 memory accesses reading data
> byte after byte for such platforms, which is a bit excessive (only reads
> for 3 bytes are needed to ensure availability of 17 bits in bitstreamreader
> cache). Legacy ARM cores (which do not support unaligned memory accesses)
> can benefit from it.
>
> Unfortunately FFmpeg contains some code which violates MIN_CACHE_BITS == 17
> requirement. One of such examples is 'svq1_dec.c'. Patch to fix it is also
> attached, but it probably makes sense to add a new function which would accept
> 1-25 range ('get_bits_medium'?) and not to lose performance on the platforms
> where ALT_BITSTREAM_READER actually can provide 25 bits. Some tweaks in this
> area are possible.
>
> Regression test still fails, so there are some other bugs to find and fix.
>
> --
> Best regards,
> Siarhei Siamashka
> Index: libavcodec/bitstream.h
> ===================================================================
> --- libavcodec/bitstream.h (revision 16028)
> +++ libavcodec/bitstream.h (working copy)
> @@ -424,9 +424,29 @@
> # define SKIP_CACHE(name, gb, num)\
> name##_cache >>= (num);
> # else
> +
> +#if !ENABLE_FAST_UNALIGNED
> +# undef MIN_CACHE_BITS
> +# define MIN_CACHE_BITS 17
> # define UPDATE_CACHE(name, gb)\
> + {\
> + const uint8_t *bufptr = ((const uint8_t *)(gb)->buffer)+(name##_index>>3);\
> + int shiftcount = name##_index & 7;\
> + uint8_t b1 = bufptr[1];\
> + uint8_t b0 = bufptr[2];\
> + uint8_t b2 = bufptr[0];\
> + shiftcount += 8;\
> + name##_cache = b0 | (b1 << 8);\
> + name##_cache |= (b2 << 16);\
> + name##_cache <<= shiftcount;\
> + }\
> +
> +#else
> +# define UPDATE_CACHE(name, gb)\
> name##_cache= AV_RB32( ((const uint8_t *)(gb)->buffer)+(name##_index>>3) ) << (name##_index&0x07);\
>
> +#endif
> +
> # define SKIP_CACHE(name, gb, num)\
> name##_cache <<= (num);
> # endif
is there any advantage in this over using one of the readers that do not do
unaligned accesses?
> Index: libavcodec/svq1dec.c
> ===================================================================
> --- libavcodec/svq1dec.c (revision 16028)
> +++ libavcodec/svq1dec.c (working copy)
> @@ -195,7 +195,7 @@
>
> #define SVQ1_CALC_CODEBOOK_ENTRIES(cbook)\
> codebook = (const uint32_t *) cbook[level];\
> - bit_cache = get_bits (bitbuf, 4*stages);\
> + bit_cache = get_bits_long (bitbuf, 4*stages);\
> /* calculate codebook entries for this vector */\
> for (j=0; j < stages; j++) {\
> entries[j] = (((bit_cache >> (4*(stages - j - 1))) & 0xF) + 16*j) << (level + 1);\
this code (not only the patch) looks like it needs some work, and such
work (cleanup) would likely make the 17bit problem go away as a side effect.
> @@ -652,7 +652,7 @@
> init_get_bits(&s->gb,buf,buf_size*8);
>
> /* decode frame header */
> - s->f_code = get_bits (&s->gb, 22);
> + s->f_code = get_bits_long (&s->gb, 22);
>
> if ((s->f_code & ~0x70) || !(s->f_code & 0x60))
> return -1;
this hunk is ok
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081208/b4a3da36/attachment.pgp>
More information about the ffmpeg-devel
mailing list