[FFmpeg-devel] [PATCH 2/2] lavc/aarch64: h264, add idct for 10bit

Martin Storsjö martin at martin.st
Fri Sep 3 14:35:59 EEST 2021


On Fri, 20 Aug 2021, Mikhail Nitenko wrote:

> Benchmarks:                                             A53     A72
> h264_idct4_add_10bpp_c:                                187.7   115.2
> h264_idct4_add_10bpp_neon:                              72.5    45.0
> h264_idct4_add_dc_10bpp_c:                              96.0    61.2
> h264_idct4_add_dc_10bpp_neon:                           36.0    19.5
> h264_idct8_add4_10bpp_c:                              2115.5  1424.2
> h264_idct8_add4_10bpp_neon:                            734.0   459.5
> h264_idct8_add_10bpp_c:                               1017.5   709.0
> h264_idct8_add_10bpp_neon:                             345.5   216.5
> h264_idct8_add_dc_10bpp_c:                             316.0   235.5
> h264_idct8_add_dc_10bpp_neon:                           69.7    44.0
> h264_idct_add16_10bpp_c:                              2540.2  1498.5
> h264_idct_add16_10bpp_neon:                           1080.5   616.0
> h264_idct_add16intra_10bpp_c:                          784.7   439.5
> h264_idct_add16intra_10bpp_neon:                       641.0   462.2
>
> Signed-off-by: Mikhail Nitenko <mnitenko at gmail.com>
> ---
>
> there is a function that is not covered by tests, but I tested it with
> sample videos, not sure what to do with it

It would be really good to add a checkasm test for it, because assembly 
without checkasm tests can have lots of hidden bugs (although it seems 
fairly straightforward here) that only get uncovered by later compiler 
updates. Not saying that it is the case here, but without a checkasm test 
we don't know.

Overall the patch seems fine, the code is fairly 1:1 copy of the existing 
but with wider SIMD elements, and I presume that the range of values don't 
allow keeping anything in more narrow form.

> +function ff_h264_idct_add8_neon_10, export=1 // NO TESTS but test video 
> looks fine (did not look fine before the fixes so it is definitely 
> working somehow)

I'm not quite sure what you mean here - did it look wrong before 
implementing this function - then we'd have a bug in the C code? Or did it 
look wrong with a broken version of this assembly function, and look right 
after getting it right?

// Martin



More information about the ffmpeg-devel mailing list