[FFmpeg-devel] [Patch]x86/hevc : new idct + ASM

James Almer jamrial at gmail.com
Mon Jun 2 22:32:09 CEST 2014


On 02/06/14 6:15 AM, Pierre Edouard Lepere wrote:
> +%macro TRANSFORM_DC_ADD 2
> +cglobal hevc_put_transform%1x%1_dc_add_%2, 4, 6, 4, dst, coeffs, stride, col_limit, temp

4, 5, 4. You're using only one temp reg, not two.

> +    xor            tempw, tempw

No need for this. The mov below should clear the reg. Same with the "xor tempq, tempq" and 
"pxor m2, m2" a couple instructions below.

> +    mov            tempw, [coeffsq]
> +    add            tempw, 1
> +    sar            tempw, 1
> +    add            tempw, [add_%2]

Why use constants for a single value when you can use immediates?

%if %2 == 8
    add tempw, 32
%else
    add tempw, 8
%endif

> +    sar            tempw, 14-%2
> +    movd              m0, tempd
> +    punpcklwd         m0, m0
> +    pshufd            m0, m0, 0

Use SPLATW here. It will come in handy if you use mmx registers as Ronald suggested for 
the 4x4 case. Just make sure to declare the functions as mmxext and not mmx as the latter 
doesn't have pshuf* instructions and will instead expand into four punpck* instructions.

> +    pxor              m1, m1
> +    xor            tempq, tempq
> +    mov            tempd, %1
> +.loop
> +    pxor              m2, m2
> +%if %1 == 2 || (%2 == 8 && %1 <= 4)

There doesn't seem to be a %1 == 2 case.

> +    movd              m2, [dstq]                                               ; load data from source
> +%elif %1 == 4 || (%2 == 8 && %1 <= 8)
> +    movq              m2, [dstq]                                               ; load data from source
> +%else
> +    movdqu            m2, [dstq]                                               ; load data from source

You can use movu and movh here. They will expand to movdqu/movq and movq/movd depending 
if you're using mmx or xmm registers.
something like this:

%if %2 == 8 && %1 <= mmsize/2
    movh m2,[dstq]
%else
    movu m2,[dstq]
%endif

Same for the store version at the end of the function.
This only if you go with mmx registers for the 4x4 case, of course.


More information about the ffmpeg-devel mailing list