[FFmpeg-devel] [PATCH] avcodec/cfhd: add x86 SIMD

Fri Aug 14 13:32:47 EEST 2020

Aug 13, 2020, 18:23 by onemda at gmail.com:

> Hi,
>
> patch attached.
>
> Please review and/or benchmark, especially .asm file.
>

I took a look. Its just the horizontal pass of an inverse 2-6 idwt with clipping.
The code is so simple I wasn't able to find any obvious ways to improve it,
except perhaps replacing the "mov xq, 0" with "xor xq, xq", since I think
xor is more universally recognized by x86 CPUs as "zeroing a register" so it'll
just allocate a pre-zeroed one. I could be wrong though, its what everyone uses.
Maybe call it idwt_26_horiz instead of a vague horiz_filter, since that's what it is?

Its also called on a per-line basis in a loop with 1 call, and 3 adds everywhere.
You could easily incorporate the loop into the function to reduce call
overhead if you want to (and I think you should look into it, but I won't block
the patch just for that). Registers might be a tight fit on 32-bit systems then,
but even using the stack should be faster than a hot function call.

Aside from those nitpicks, LGTM.

SIMDing the remaining DSP function (interlaced_vertical_filter) should help a lot
too, though that function is pretty much trivial, since its just an average + deinterleave.
That function should 100% have its 3-line loop incorporated into it, however, as you'll
definitely have no shortage of registers, even on 32bit systems.