[FFmpeg-devel] [PATCH v4 0/5] avcodec/ac3: Add aarch64 NEON DSP

Mon Apr 8 13:47:56 EEST 2024

On Sat, 6 Apr 2024, Geoff Hill wrote:

> Thanks Martin for your review and testing.
>
> Here's v4 with the following changes:
>
>  * Use fmal in sum_square_butterfly_float loop. Faster.
>
>  * Removed redundant loop bound zero checks in extract_exponents,
>    sum_square_bufferfly_int32 and sum_square_bufferfly_float.
>
>  * Fixed randomize_int24() to also use negative values.
>
>  * Carry copyright from arm implementation over to aarch64. I
>    did use this version as reference.
>
>  * Fix indentation to match existing aarch64 assembly style.
>
> Tested once again on aarch64 and x86.

Thanks, this set looked good, so I pushed it.

I amended the commits a bit, moving the added copyright line from 
checkasm/ac3dsp.c from patch 1 to 2, where that file actually gets 
extended.

Actually, after pushing, I realized another thing that can be done better 
in ff_ac3_sum_square_butterfly_float_neon - I'll send a patch for that.

> On AWS Graviton2 (t4g.medium), GCC 12.3:
>
> $ tests/checkasm/checkasm --bench --test=ac3dsp
> ...
> NEON:
> - ac3dsp.ac3_exponent_min               [OK]
> - ac3dsp.ac3_extract_exponents          [OK]
> - ac3dsp.float_to_fixed24               [OK]
> - ac3dsp.ac3_sum_square_butterfly_int32 [OK]
> - ac3dsp.ac3_sum_square_butterfly_float [OK]
> checkasm: all 20 tests passed
> float_to_fixed24_c: 2460.5
> float_to_fixed24_neon: 561.5

FWIW, it's usually neater to include such numbers in the commit message, 
so it gets brought along into the final git history (to show the benefit 
we got from the optimization at the time), quoting only those functions 
that are added/modified in each patch. But I didn't amend in that in the 
commit messages this time, but you can keep it in mind for the future.

Anyway, thanks for the patches!

// Martin