[FFmpeg-devel] [PATCH] Dsputilize some functions from APE decode 1/2 - Altivec implementation

Thu Jul 10 19:09:17 CEST 2008

On Thu, Jul 10, 2008 at 08:43:12AM -0600, Loren Merritt wrote:
> On Thu, 10 Jul 2008, Kostya wrote:
> 
> > On Tue, Jul 08, 2008 at 03:18:12PM -0600, Loren Merritt wrote:
> >> Entirely untested (I don't have a ppc), but this looks like it should be
> >> faster. Your other functions would benefit from similar.
> >> For that matter, a whole lot of dsp functions put lvsl inside the loop
> >> when it should be constant (assuming stride%16==0).
> >>
> >> --Loren Merritt
> >
> > It does not work as supposed modifying only the start of output array.
> > Thanks for trying anyway.
> 
> That's what I get for trying to write without a compiler.
> should be
> +        pv1 += 2;
> +        pv2 += 2;
> 
> --Loren Merritt

Now it works fine but on my G4 under macosx it gives such numbers:
clocks for 10 million cycles on arrays of length 256
unoptimized gcc 3.3
  Mine:   6726
  Yours:  7220
gcc-3.3
  Mine:    960
  Yours:  1468
unoptimized gcc 4.0.1
  Mine:   6935
  Yours:  7682
gcc-4.0 -O3
  Mine:   1113
  Yours:  1498

I guess it was MMX that was designed with black magic in mind.