[FFmpeg-devel] [PATCH] lavc/alsdec: Add NEON optimizations
Thilo Borgmann
thilo.borgmann at mail.de
Mon Mar 1 20:23:53 EET 2021
Hi Martin,
>> it's my first attempt to do some assembly, it might still includes some dont's of the asm world...
>> Tested with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
>>
>> Speed-wise, it sees a drop for small prediction orders until around 10 or 11.
>> Well, the maximum prediction order is 1023.
>> I therefore checked with the "real-world" samples from the fate-suite, which suggests low prediction orders are non-dominant:
>>
>> pred_order = {7..17}, gain: 23%
>>
>> als_reconstruct_all_c: 26645.2
>> als_reconstruct_all_neon: 20635.2
>
> This is the combination that the patch actually tests by default, if I read the code correctly - right?
exactly.
> You didn't write what CPU you tested this on - do note that the actual peformance of the assembly is pretty heavily dependent on the CPU.
>
> I get roughly similar numbers if I build with GCC:
>
> Cortex A53 A72 A73
> als_reconstruct_all_c: 107708.2 44044.5 57427.7
> als_reconstruct_all_neon: 78895.7 38464.7 34065.5
Was a remote one, don't know exactly, yet. Will find out for v2.
> However - if I build with Clang, where vectorization isn't disabled by configure, the C code beats the handwritten assembly:
>
> Cortex A53
> als_reconstruct_all_c: 69145.7
> als_reconstruct_all_neon: 78895.7
>
> Even if I only test order 17, the C code still is faster. So clearly we can do better - if nothing else, we could copy the assembly code that Clang outputs :-)
Narf. Well maybe thoughts about the code itself will get more speed manually...
> First a couple technical details about the patch...
> [...]
I very much appreciate your excessive feedback, I will need quite some time to work through it! :)
Thanks!
-Thilo
More information about the ffmpeg-devel
mailing list