[FFmpeg-devel] [PATCH] lavc/alsdec: Add NEON optimizations

Thilo Borgmann thilo.borgmann at mail.de
Mon Mar 1 20:23:53 EET 2021


Hi Martin,

>> it's my first attempt to do some assembly, it might still includes some dont's of the asm world...
>> Tested with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
>>
>> Speed-wise, it sees a drop for small prediction orders until around 10 or 11.
>> Well, the maximum prediction order is 1023.
>> I therefore checked with the "real-world" samples from the fate-suite, which suggests low prediction orders are non-dominant:
>>
>> pred_order = {7..17}, gain: 23%
>>
>> als_reconstruct_all_c: 26645.2
>> als_reconstruct_all_neon: 20635.2
> 
> This is the combination that the patch actually tests by default, if I read the code correctly - right?

exactly.


> You didn't write what CPU you tested this on - do note that the actual peformance of the assembly is pretty heavily dependent on the CPU.
> 
> I get roughly similar numbers if I build with GCC:
> 
>                          Cortex A53      A72      A73
> als_reconstruct_all_c:     107708.2  44044.5  57427.7
> als_reconstruct_all_neon:   78895.7  38464.7  34065.5

Was a remote one, don't know exactly, yet. Will find out for v2.


> However - if I build with Clang, where vectorization isn't disabled by configure, the C code beats the handwritten assembly:
> 
>                        Cortex A53
> als_reconstruct_all_c:    69145.7
> als_reconstruct_all_neon: 78895.7
> 
> Even if I only test order 17, the C code still is faster. So clearly we can do better - if nothing else, we could copy the assembly code that Clang outputs :-)

Narf. Well maybe thoughts about the code itself will get more speed manually...


> First a couple technical details about the patch...
> [...]

I very much appreciate your excessive feedback, I will need quite some time to work through it! :)

Thanks!
-Thilo


More information about the ffmpeg-devel mailing list