[FFmpeg-devel] [Patch]x86/hevc : new idct + ASM

Pierre Edouard Lepere Pierre-Edouard.Lepere at insa-rennes.fr
Wed Jun 4 13:21:05 CEST 2014


here's a new patch with the suggestions :

- uses less registers
- 4x4 function uses MMX
- uses immediates when applicable

Regards,
Pierre-Edouard Lepere

----- Mail original -----
De: "James Almer" <jamrial at gmail.com>
À: "FFmpeg development discussions and patches" <ffmpeg-devel at ffmpeg.org>
Envoyé: Lundi 2 Juin 2014 22:32:09
Objet: Re: [FFmpeg-devel] [Patch]x86/hevc : new idct + ASM

On 02/06/14 6:15 AM, Pierre Edouard Lepere wrote:
> +%macro TRANSFORM_DC_ADD 2
> +cglobal hevc_put_transform%1x%1_dc_add_%2, 4, 6, 4, dst, coeffs, stride, col_limit, temp

4, 5, 4. You're using only one temp reg, not two.

> +    xor            tempw, tempw

No need for this. The mov below should clear the reg. Same with the "xor tempq, tempq" and 
"pxor m2, m2" a couple instructions below.

> +    mov            tempw, [coeffsq]
> +    add            tempw, 1
> +    sar            tempw, 1
> +    add            tempw, [add_%2]

Why use constants for a single value when you can use immediates?

%if %2 == 8
    add tempw, 32
%else
    add tempw, 8
%endif

> +    sar            tempw, 14-%2
> +    movd              m0, tempd
> +    punpcklwd         m0, m0
> +    pshufd            m0, m0, 0

Use SPLATW here. It will come in handy if you use mmx registers as Ronald suggested for 
the 4x4 case. Just make sure to declare the functions as mmxext and not mmx as the latter 
doesn't have pshuf* instructions and will instead expand into four punpck* instructions.

> +    pxor              m1, m1
> +    xor            tempq, tempq
> +    mov            tempd, %1
> +.loop
> +    pxor              m2, m2
> +%if %1 == 2 || (%2 == 8 && %1 <= 4)

There doesn't seem to be a %1 == 2 case.

> +    movd              m2, [dstq]                                               ; load data from source
> +%elif %1 == 4 || (%2 == 8 && %1 <= 8)
> +    movq              m2, [dstq]                                               ; load data from source
> +%else
> +    movdqu            m2, [dstq]                                               ; load data from source

You can use movu and movh here. They will expand to movdqu/movq and movq/movd depending 
if you're using mmx or xmm registers.
something like this:

%if %2 == 8 && %1 <= mmsize/2
    movh m2,[dstq]
%else
    movu m2,[dstq]
%endif

Same for the store version at the end of the function.
This only if you go with mmx registers for the 4x4 case, of course.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel at ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-added-new-idct-and-first-idct-asm.patch
Type: text/x-patch
Size: 26816 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140604/acd8e080/attachment.bin>


More information about the ffmpeg-devel mailing list