[FFmpeg-devel] [PATCH] lavc/aarch64: h264qpel, add lowpass_8 based functions

Martin Storsjö martin at martin.st
Fri Sep 3 14:26:15 EEST 2021


On Fri, 3 Sep 2021, Martin Storsjö wrote:

>> +function \type\()_h264_qpel8_v_lowpass_neon_10
>> +        ld1             {v16.8H}, [x1], x3
>> +        ld1             {v18.8H}, [x1], x3
>> +        ld1             {v20.8H}, [x1], x3
>> +        ld1             {v22.8H}, [x1], x3
>> +        ld1             {v24.8H}, [x1], x3
>> +        ld1             {v26.8H}, [x1], x3
>> +        ld1             {v28.8H}, [x1], x3
>> +        ld1             {v30.8H}, [x1], x3
>> +        ld1             {v17.8H}, [x1], x3
>> +        ld1             {v19.8H}, [x1], x3
>> +        ld1             {v21.8H}, [x1], x3
>> +        ld1             {v23.8H}, [x1], x3
>> +        ld1             {v25.8H}, [x1]
>> +
>> +        transpose_8x8H  v16, v18, v20, v22, v24, v26, v28, v30, v0,  v1
>> +        transpose_8x8H  v17, v19, v21, v23, v25, v27, v29, v31, v0,  v1
>> +        lowpass_8_10    v16, v17, v18, v19, v16, v17
>> +        lowpass_8_10    v20, v21, v22, v23, v18, v19
>> +        lowpass_8_10    v24, v25, v26, v27, v20, v21
>> +        lowpass_8_10    v28, v29, v30, v31, v22, v23
>> +        transpose_8x8H  v16, v17, v18, v19, v20, v21, v22, v23, v0,  v1
>
> I'm a bit surprised by doing this kind of vertical filtering by transposing 
> and doing it horizontally - when vertical filtering can be done so 
> efficiently as-is without needing any extra 'ext' instructions and such. But 
> I see that the existing code does it this way. I'll give it a try to make a 
> PoC of rewriting the existing code for some case to see how it behaves 
> without the transposes.

The potential speedups for the vertical filters are huge actually; I've 
sent a patch that IMO simplifies this (getting rid of all transposes). I'd 
appreciate if you'd remodel your patch according to it.

// Martin


More information about the ffmpeg-devel mailing list