[FFmpeg-devel] [PATCH] Channels correlation

Thu Oct 29 21:44:06 CET 2009

On Thu, Oct 29, 2009 at 08:02:55PM +0100, Nicolas George wrote:
> > Though even if you don't want to do that least you could
> > if you ensure that nc1 != 0 (probably a good idea anyway) do
> > something like below, freeing the registers otherwise used for n1 and n2
> > (or just use a variable n = FFMIN(n1, n2) )
> > d1_end = d1 + nc1 * FFMIN(n1, n2);
> 
> I tried various variations on that theme, but none showed any enhancement
> compared with the current code, and some simple changes that should
> obviously speed up things ended up slowing them.
> 
> I think that I am interfering with the compiler optimizations, and that any
> benchmark results would be highly sensible to the version of the compiler.

Not really, it is mostly that almost no compilers are able to handle x86's
stack-based floating-point unit even remotely sensibly.
However you should at least start with extracting the constant sample1[i] into
a separate variable. If you don't believe me, look at the assembler code and
you'll see that gcc is _not_ able to figure that out on its own (actually it
simply can't, since you didn't use restrict on any of the pointer variables,
but even with that it won't).
I also guess that unrolling the innermost loop should give quite an advantage,
particular if you make special versions for common values of nch1.

> For some reason, computing the lower half of the matrix, on the other hand,
> causes a big slowdown. Again, I think this is too near the compiler
> optimization process to make any conclusion: I expect that depending on the
> architecture and compiler version, almost any version can be the fastest or
> the slowest.

Once again confirming my rule: If code speed changes strangely with source
code changes, the compiler-generated assembler will make you weep.