[FFmpeg-devel] Extend/optimize RGB to RGB conversions funcs into rgb2rgb.c
yann.lepetitcorps at free.fr
yann.lepetitcorps at free.fr
Mon Sep 10 02:47:24 CEST 2012
With a bigger number of tests/iterations, results are very less fluctuants
RGB->RGBA and RGBA->RGB conversions tests (npixels=1024 niters=65536)
Test original rgb24to32() func : 182 ms
Test new rgb24to32()_alpha func : 177 ms
Test original rgba32to24() func : 138 ms
Test modified rgba32to24() func : 142 ms
rgb24to32() : original=182ms modified=177ms (5ms 2.82%)
rgba32to24() : original=138ms modified=142ms (-4ms -2.82%)
The new rgb24to32_alpha() func is more speed than the original rgb24to32(), with
the alpha handling for free :)
But at the inverse the modified rgba32to24() is a less speed than the original
version :(
=> I take tomorrow a look at the asm output for to understand exactly why ...
@+
Yannoo
Selon yann.lepetitcorps at free.fr:
> Exact, I have rebench it but with -O9 parameter on GCC and the runtime
> difference between to originals and new versions is relatively small :
>
> Test original rgb24to32() func : 28 ms
> Test new rgb24to32_alpha() func : 28 ms
> Test original rgba32to24() func : 24 ms
> Test modified rgba32to24() func : 23 ms
>
> rgb24to32() : original=28ms modified=28ms (0ms 0.00%)
>
> rgba32to24() : original=24ms modified=23ms (1ms 4.35%)
>
> Note that results are relatively fluctuant with diiferences between -15% and
> +15%
> (the "new" rgba32to24() seem generally more fast than the "old" but the new
> rgb24to32_alpha() is regulary less fast than rgb24to32() [but it handle the
> alpha parameter where rgb24to32() always set the alpha to 255)
>
>
> @+
> Yannoo
>
>
>
> Selon Loren Merritt <lorenm at u.washington.edu>:
>
> > On Mon, 10 Sep 2012, yann.lepetitcorps at free.fr wrote:
> > > Selon Reimar Döffinger <Reimar.Doeffinger at gmx.de>:
> > >
> > >> Though one thing I wonder is why exactly that is faster, and why your
> > >> compiler can't figure out how to optimize it on its own.
> > >> There is also a bit the issue that compared to NEON-optimizing the code
> > >> this is rather a very minor optimization.
> > >
> > > I think that is a little more speed because of this :
> > >
> > > - dst[3 * i + 0] = src[4 * i + 2];
> > > - dst[3 * i + 1] = src[4 * i + 1];
> > > - dst[3 * i + 2] = src[4 * i + 0];
> > >
> > > + dst[0] = psrc[2];
> > > + dst[1] = psrc[1];
> > > + dst[2] = psrc[0];
> > >
> > > => the copy is make with a "direct" adressing, cf. without
> multiplications
> > or
> > > additions into the [] array adressing
> > > (can the compilator handle automaticaly the * 3 multiplication for free
> ?)
> >
> > It's not that a *3 is free, but rather that the addressing mode of the
> > generated instructions doesn't have to be the same as the one in the
> > source code. GCC is normally capable of switching from index variables to
> > pointer incrementing or vice versa, though it doesn't always choose
> > optimally when to do so.
> >
> > --Loren Merritt
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list