[MPlayer-dev-eng] [PATCH] Add NEON optimizations to some critical audio functions.

Fri Oct 25 23:17:57 CEST 2013

On Fri, 25 Oct 2013 23:07:55 +0200
Reimar Döffinger <Reimar.Doeffinger at gmx.de> wrote:

> 
> 
> On 25.10.2013, at 21:58, Reimar Döffinger <Reimar.Doeffinger at gmx.de> wrote:
> 
> > One big issue is that lrintf is ridiculously slow on most (all?)
> > ARM Linux distributions, which makes the format conversion take
> > more time than the audio decoding.
> > It uses intrinsics because I was too lazy to learn the inline asm
> > syntax and for these trivial cases gcc doesn't seem to be able to
> > mess it up.
> 
> And there I once again underestimated how ridiculously bad gcc is.
> Already for the ad_ffmpeg.c code gcc spills the registers it loaded directly onto the stack, just to load them _into the same registers_ again two instructions later.
> Not surprisingly, this one actually makes things slower (the af_format one I had checked that everything works and gives a good speedup).
> Any volunteers to teach me how to do it in inline asm?

Well, I think there's a reason why ffmpeg doesn't use intrinsics, and
is even abandoning inline asm in favor of external assembler...

Anyway, is af_format (now) faster than lib{av,sw}resample? Both have
ARM asm for s16/float conversion.