[FFmpeg-devel] [patch][OpenHEVC]added ASM functions for epel + qpel

Mon Mar 3 20:28:35 CET 2014

Hi,

for the SBUTTERFLY, it's to have coeff1 and coeff 2 in the same register. the explanation of Ronald is more or less spot on.
this allows the use of maddubs and add instructions instead of mul and hadd which are more costly IIRC.

> 2014-03-03 15:23 GMT+01:00 Pierre Edouard Lepere
> <Pierre-Edouard.Lepere at insa-rennes.fr>:
> > here's a new version of the patches. The first one did not change, but
> the second changed by adding macros, diminishing substantially the code.
>
> I don't understand why you need to shuffle the input pixels with
> SBUTTERFLY. Anyway, I feel uncomfortable being the only reviewer when
> my reviews take like 2 minutes and are far from thorough.
>
> Also, one last trick would be to use pmulhrw to perform the
> rounding+shift in one instruction but that's, again, not something
> worth more wait.

>I agree the final round/shift should be done using pmulhrsw and anappropriate constant if that is feasible. I also noticed you're doing psllw
>mx, 6 after some punpcklbws, Why?

you are refering to this ?
    punpcklbw         m1, m0, m15
    psllw             m1, 6

it's basically : put the 8-bit value in 16-bit, then shift them to the left by 6.
the C code is :          
dst[x] = src[x] << (14 - BIT_DEPTH);
with dst in int16 and src in uint8

I'll try to find a good coeff to use the pmulhrw instruction instead of the current shifts.

Regards,