[MPlayer-dev-eng] [PATCH] Add NEON optimizations to some critical audio functions.

Reimar Döffinger Reimar.Doeffinger at gmx.de
Fri Oct 25 23:48:40 CEST 2013


On 25.10.2013, at 23:17, wm4 <nfxjfg at googlemail.com> wrote:
> On Fri, 25 Oct 2013 23:07:55 +0200
> Reimar Döffinger <Reimar.Doeffinger at gmx.de> wrote:
> 
>> 
>> 
>> On 25.10.2013, at 21:58, Reimar Döffinger <Reimar.Doeffinger at gmx.de> wrote:
>> 
>>> One big issue is that lrintf is ridiculously slow on most (all?)
>>> ARM Linux distributions, which makes the format conversion take
>>> more time than the audio decoding.
>>> It uses intrinsics because I was too lazy to learn the inline asm
>>> syntax and for these trivial cases gcc doesn't seem to be able to
>>> mess it up.
>> 
>> And there I once again underestimated how ridiculously bad gcc is.
>> Already for the ad_ffmpeg.c code gcc spills the registers it loaded directly onto the stack, just to load them _into the same registers_ again two instructions later.
>> Not surprisingly, this one actually makes things slower (the af_format one I had checked that everything works and gives a good speedup).
>> Any volunteers to teach me how to do it in inline asm?
> 
> Well, I think there's a reason why ffmpeg doesn't use intrinsics, and
> is even abandoning inline asm in favor of external assembler...
> 
> Anyway, is af_format (now) faster than lib{av,sw}resample? Both have
> ARM asm for s16/float conversion.

No idea, don't really care if it will be using 0.3 or 0.5% of CPU anyway. Just around 10% (and thus a lot more than the decoding) is not acceptable.
My first guess is that libswresample is both faster and more accurate.
And I wonder if I have an off-by-one error that would end up doubling the volume.


More information about the MPlayer-dev-eng mailing list