[FFmpeg-devel] [PATCH] Unroll base64 decode loop.

Reimar Döffinger Reimar.Doeffinger at gmx.de
Sat Jan 21 16:26:30 CET 2012


On Sat, Jan 21, 2012 at 03:58:45PM +0100, Michael Niedermayer wrote:
> On Sat, Jan 21, 2012 at 12:51:58PM +0100, Reimar Döffinger wrote:
> > On Sat, Jan 21, 2012 at 12:45:09PM +0100, Reimar Döffinger wrote:
> > > Around 50% faster.
> > > decode:       374139 -> 248852 decicycles
> > > syntax check: 236955 -> 123854 decicycles
> > 
> > Note that this is despite gcc failing completely and utterly,
> > randomly deciding to make the "goto out" path the "fast" path
> > and sometimes not.
> > The code the optimizer creates IMO simply makes no sense.
> > I did not try it with this code, but using the __builtin_expect
> > cluebat did not help one bit on the previous try (which did
> > not use the larger table and thus resulted in even messier code).
> > The numbers mean that it still needs about 24 cycles per byte on
> > the Phenom2. Not sure if I should consider that good or bad...
> 
> id consider it bad if it was a human who wrote the asm :)
> 
> also it probably can be improved by making the table signed and making
> invalid values negativ, with that if the bits get ored together the
> final value will be negative if any input was so fewer checks could
> be used.

I don't think so, not without requiring padding.
Admittedly with valid input there should be enough padding with =
but I don't think we can assume that.


More information about the ffmpeg-devel mailing list