[FFmpeg-devel] [PATCH/RFC] ALT_BITSTREAMREADER optimizations?for the platforms with strict alignment

Siarhei Siamashka siarhei.siamashka
Wed Dec 10 02:14:38 CET 2008


On Wednesday 10 December 2008, Siarhei Siamashka wrote:
> On Tuesday 09 December 2008, Michael Niedermayer wrote:
> > On Tue, Dec 09, 2008 at 12:17:03AM +0200, Siarhei Siamashka wrote:
> > > On Monday 08 December 2008, Michael Niedermayer wrote:
> > > > On Mon, Dec 08, 2008 at 12:12:41PM +0200, Siarhei Siamashka wrote:
> >
> > [...]
> >
> > > > > Index: libavcodec/svq1dec.c
> > > > > ===================================================================
> > > > > --- libavcodec/svq1dec.c	(revision 16028)
> > > > > +++ libavcodec/svq1dec.c	(working copy)
> > > > > @@ -195,7 +195,7 @@
> > > > >
> > > > >  #define SVQ1_CALC_CODEBOOK_ENTRIES(cbook)\
> > > > >        codebook = (const uint32_t *) cbook[level];\
> > > > > -      bit_cache = get_bits (bitbuf, 4*stages);\
> > > > > +      bit_cache = get_bits_long (bitbuf, 4*stages);\
> > > > >        /* calculate codebook entries for this vector */\
> > > > >        for (j=0; j < stages; j++) {\
> > > > >          entries[j] = (((bit_cache >> (4*(stages - j - 1))) & 0xF)
> > > > > + 16*j) << (level + 1);\
> > > >
> > > > this code (not only the patch) looks like it needs some work, and
> > > > such work (cleanup) would likely make the 17bit problem go away as a
> > > > side effect.
> > >
> > > Is anybody going to do this cleanup in the near future? I just wke
> > > onder whether you are suggesting me to do this work or whatever.
> >
> > at the risk of a flamewar with you, yes i suggest you do clean these 4
> > lines up instead of making the code alot slower.
>
> ok, thanks, that's the first thing I wanted to clarify.
>
> What do you think would be a proper cleanup in this case? I'm sorry, I
> still can't always understand you correctly with just vague hints and this
> is what may become a source of a flamewar.
>
> This code can be changed either for better readability or for
> better performance.
>
> In the former case, just a simple loop doing calls to 'get_bits(bitbuf, 4)'
> would be a solution.
>
> In the latter case, we want to have as little UPDATE_CACHE operations as
> possible. As the maximum value for 'stages' is 6, we don't need to read
> more than 24 bits and a single UPDATE_CACHE operation is enough for the
> case when MIN_CACHE_BITS is 25 or 32. Original code apparently aimed at
> minimizing bitstream reader related overhead and tried to get '4*stages'
> bits at once (and I can't say that I have something against this solution).
> Then the code proceeds with processing this data in the icky looking loop
> (this is what could be simplified for sure). The only real problem with
> this code is that it does not take into account bitstream readers with
> MIN_CACHE_BITS set to 17.
>
> Currently bitstream reader provides functions:
> 'get_bit1' ensures 1 bit of data
> 'get_bits_long' ensures up to 32 bits of data
> 'get_bits' is specified to guarantee only up to 17 bits, but in reality is
> implemented to provide 25 on all the platforms with unmodified ffmpeg
> sources.
>
> I suggest introducing a complete selection of functions: read 1 bit, read
> up to 9 bits, read up to 17 bits, read up to 25 bits, read up to 32 bits.
> And codec implementors could use appropriate ones according to their
> requirements. Bitstream reader implementation can take advantage of having
> less bits to provide in some cases. For 32-bit x86 platform with
> ALT_BITSTREAM_READER, these special fast cases are 1 and 25 bits
> (implemented with reading int8_t or int32_t). Little endian platforms with
> strict alignment would benefit from all of these functions (the versions
> with less bits guaranteed result in faster and more compact the code).
> 64-bit x86 systems could probably get a fast 32 bit variant as well.
>
> As for this get_bits call in SVQ1_CALC_CODEBOOK_ENTRIES macro, the
> implementor probably really wants to use the variant, which provides up to
> 25 bits.
>
> > You dont have to of course but iam sure you will unerstand that i wont
> > accept a workaround instead of a proper fix.
> > And i doubt you would if you where making the decission and someone
> > else did propose it.
>
> Sorry, I have probably written too much. Anyway, to make a short summary:
> please tell me what kind of modifications to this code you would like to
> see and I'll do it.

Well, instead of writing all these lengthy explanations, a patch to show it in
action would be probably better. It is attached.

Except that I would prefer to have 'get_bits_upto25' (or whatever name suits
it better) defined in 'bitstream.h'. Just because it can be implemented more
efficiently than a call to 'get_bits_long' even on this kind of 17-bit
ALT_BITSTREAM_READER variant that I'm trying to advocate for old ARM
cores (just simply the code straight from the standard 25-bit
ALT_BITSTREAM_READER implementation can be used for it directly) .

-- 
Best regards,
Siarhei Siamashka
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sqv1dec_bitstreamreader_fix_try2.diff
Type: text/x-diff
Size: 1227 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081210/5d8d08f5/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081210/5d8d08f5/attachment.pgp>



More information about the ffmpeg-devel mailing list