[FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

Reimar Döffinger Reimar.Doeffinger at gmx.de
Sat Jan 16 03:13:11 EET 2021



> On 15 Jan 2021, at 23:55, Martin Storsjö <martin at martin.st> wrote:
> 
> On Tue, 12 Jan 2021, Reimar.Doeffinger at gmx.de wrote:
> 
>> create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S
>> create mode 100644 libavcodec/aarch64/hevcdsp_init_aarch64.c
> 
> This patch fails checkasm

Fixed, one mis-translated coefficient index...

>> 
>> +.macro fixsqrshrn d, dt, n, m
>> +  .ifc \dt, .8H
>> +        sqrshrn2        \d\dt, \n\().4S, \m
>> +  .else
>> +        sqrshrn         \n\().4H, \n\().4S, \m
>> +        mov             \d\().D[0], \n\().D[0]
>> +  .endif
>> +.endm
> 
> Is this to set the lower half of the dest register, without wiping the upper part? It looks a bit clumsy (and stall-prone on in-order cores), but I guess it's not easy to do things differently without rewriting the higher level structure?

Did not have a good idea, but I was also aiming for keeping the structure of the 32-bit and 64-bit code similar.
In particular since I did not know about checkasm and expected some further painful debug.

> 
>> +.macro transpose_8x8 r0, r1, r2, r3, r4, r5, r6, r7
>> +        transpose8_4x4  \r0, \r1, \r2, \r3
>> +        transpose8_4x4  \r4, \r5, \r6, \r7
>> +.endm
>> +
> 
> There's a bunch of existing transposes in libavcodec/aarch64/neon.S - can they be used? Not that they're rocket science, though... And I see that the existing arm version also has got its own transpose macros.
> 
> If it's inconvenient to use shared macros, this is fine.

They are different and seem to not be documented, so it would take some
time to figure out how to replace them.
There’s also a bit of a question if I’d want to give up alignment
with the 32-bit code just yet.


> 
>> +        // Transpose each 4x4 block, and swap how d4-d7 and d8-d11 are used.
>> +        // Layout before:
>> +        // d0  d1
>> +        // d2  d3
>> +        // d4  d5
>> +        // d6  d7
>> +        // d8  d9
>> +        // d10 d11
>> +        // d12 d13
>> +        // d14 d15
> 
> These layouts don't look like they're up to date for the aarch64 version?

Removed in new version as it seems not that useful.

>> 
>> +        vst1.s32        {\in7}, [r3, :128]
>> +.endm
> 
> This is left behind untranslated (and thus unused)

Removed. I am not sure if that means it’s also unused in the 32-bit version.

>> +
>> +        movrel          x1, trans + 16
> 
> The movrel macro has got a separate third optional argument for the offset, so write this "movrel x1, trans, 16". (Older versions of llvm are picky with the symbol offset syntax and break with this, as the macro right now adds its own implicit +(0) here. If you pass the offset in the macro parameter, all the offsets get passed within the parentheses.

Changed.

>> +.macro idct_16x16 bitdepth
>> +function ff_hevc_idct_16x16_\bitdepth\()_neon, export=1
>> +//r0 - coeffs
>> +        mov             x15, lr
>> +
> 
> Binutils doesn't recognize "lr" as alias for x30

It didn’t have an issue in the Debian unstable VM?
That seems like the kind of workaround where it would be
better to leave a comment with more info, if you know
what exactly is affected.



More information about the ffmpeg-devel mailing list