[FFmpeg-devel] [PATCH] strict-aliasing-safe aes.c

Reimar Döffinger Reimar.Doeffinger
Wed Jun 30 00:05:26 CEST 2010


On Tue, Jun 29, 2010 at 11:29:42PM +0200, Michael Niedermayer wrote:
> On Tue, Jun 29, 2010 at 07:27:44PM +0200, Reimar D?ffinger wrote:
> > On Tue, Jun 29, 2010 at 06:16:03PM +0200, Reimar D?ffinger wrote:
> > > On Tue, Jun 29, 2010 at 05:10:09PM +0100, M?ns Rullg?rd wrote:
> > > > Reimar D?ffinger <Reimar.Doeffinger at gmx.de> writes:
> > > > >> Are these API changes necessary?  Can't the rest of the fixes be done
> > > > >> without this?
> > > > >
> > > > > No, they are not necessary, however I am in favour of at least having
> > > > > av_aes_block available to allow for easy allocation of data
> > > > > with suitable alignment.
> > > > 
> > > > Exposing internals like that is less than ideal.  Simply documenting
> > > > the 8-byte alignment requirement should be enough.
> > > 
> > > I guess I agree, I'll make the change if the idea is ok'd in principle.
> > 
> > Oh well, I did it anyway already since I noticed I broke the more-than-one
> > block case.
> 
> >  aes.c |   74 +++++++++++++++++++++++++++++++++++++-----------------------------
> >  1 file changed, 42 insertions(+), 32 deletions(-)
> > 172e32357bec3f033440078a57b908ff696472d7  aesalias.diff
> 
> if its confirmed that there is no speedloss and the object file does not
> become larger than i wont object to the patch. But its still ugly

Well, I had to test with gcc-4.3, since 4.4 does not produce working code...
Pure assembler size increases from 3957 bytes to 4025, for what I can tell
mostly because gcc randomly decides to use some extra bytes for the local
variables, which just makes it having to use 4-byte instead of 1-byte offsets
to address function parameters, which in turn then causes some alignment
to increase as well.
As for speed, with aes-test the results seem to indicate a surprising
2% performance advantage for the new code (I'd expect due to some
other random code or data re-shuffling):
old:
13377 dezicycles in aes, 8191 runs, 1 skips
13387 dezicycles in aes, 8192 runs, 0 skips
13368 dezicycles in aes, 8191 runs, 1 skips
13360 dezicycles in aes, 8191 runs, 1 skips
14279 dezicycles in aes, 8191 runs, 1 skips

new:
12929 dezicycles in aes, 8191 runs, 1 skips
12926 dezicycles in aes, 8191 runs, 1 skips
12916 dezicycles in aes, 8191 runs, 1 skips
12913 dezicycles in aes, 8191 runs, 1 skips
12973 dezicycles in aes, 8191 runs, 1 skips

I'd welcome someone else to test.
Does this seem good enough to you?



More information about the ffmpeg-devel mailing list