[Libav-user] gcc auto-vectorisation

Tue Feb 26 17:07:40 CET 2013

On Feb 26, 2013, at 16:53, Claudio Freire wrote:

> Um... AFAIR, SSE doesn't crash on misaligned access, it just performs poorly.

On MS Windows, that kind of access (can) provoke(s) a crash.

>> Of course, but as far as I have understood not in this case, because Apple makes such intensive use of SIMD throughout its APIs/SDKs.
> 
> Ok, but intrinsics and auto-vectorization aren't the same thing, I was
> just wondering if gcc knew about the alignment.

No, they aren't, though in the end they both lead to the use of MMX/SSE instructions.

> I guess the only possibility then is to compare the resulting assembly
> to try to spot why SIMD isn't outperforming scalar code. It's all in
> the details, but it should outperform it, if there isn't too much
> overhead.

Overhead can be due to setting things up for using vectorised code, but it can also take other forms. Auto-vectorisation may have evolved to be able to handle more kinds of code (loops), but that still doesn't mean that anything can be vectorised (efficiently), nor that an executing app spends a significant amount of time in the parts that can be vectorised. I think that's what happens here, at least in the test suite. That doesn't mean that there is no benefit whatsoever, but I have a too limited knowledge of the different parts of the ffmpeg libs to start looking for that.
When I have time I might compare builds that do not use any of the hand-coded optimisations or assembly code, to see if auto-vectorisation has a benefit there.

And of course if someone has a good idea of a subsystem that doesn't yet benefit of extensive hand-coded optimisations and can provide a simple test case, I'm perfectly willing to do some comparisons.

R.