[FFmpeg-devel] [FFmpeg-cvslog] x86/me_cmp: port mmxext and sse2 sad functions to yasm

Clément Bœsch u at pkh.me
Wed Sep 17 13:18:12 CEST 2014


On Wed, Sep 17, 2014 at 11:41:32AM +0200, James Almer wrote:
> ffmpeg | branch: master | James Almer <jamrial at gmail.com> | Tue Sep 16 21:41:47 2014 -0300| [0456d169c469a79e305813d14c873fe698c8c572] | committer: Michael Niedermayer
> 
> x86/me_cmp: port mmxext and sse2 sad functions to yasm
> 
> Also add a missing c->pix_abs[0][0] initialization, and sse2 versions of
> sad16_x2, sad16_y2 and sad16_xy2 (%15 to %20 faster than mmxext).
> Since the _xy2 versions are not bitexact, they are accordingly marked as
> approximate.
> 
> Signed-off-by: James Almer <jamrial at gmail.com>
> Signed-off-by: Michael Niedermayer <michaelni at gmx.at>
> 
> > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=0456d169c469a79e305813d14c873fe698c8c572
> ---
> 
>  libavcodec/x86/me_cmp.asm    |  330 ++++++++++++++++++++++++++++++++++++++++++
>  libavcodec/x86/me_cmp_init.c |  203 +++++++-------------------
>  2 files changed, 379 insertions(+), 154 deletions(-)
> 
> diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm
> index b0741f3..27176f4 100644
> --- a/libavcodec/x86/me_cmp.asm
> +++ b/libavcodec/x86/me_cmp.asm
> @@ -23,6 +23,10 @@
>  
>  %include "libavutil/x86/x86util.asm"
>  
> +SECTION_RODATA
> +
> +cextern pb_1
> +
>  SECTION .text
>  
>  %macro DIFF_PIXELS_1 4
> @@ -465,3 +469,329 @@ cglobal hf_noise%1, 3,3,0, pix1, lsize, h
>  INIT_MMX mmx
>  HF_NOISE 8
>  HF_NOISE 16
> +
> +;---------------------------------------------------------------------------------------
> +;int ff_sad_<opt>(MpegEncContext *v, uint8_t *pix1, uint8_t *pix2, int stride, int h);
> +;---------------------------------------------------------------------------------------
> +INIT_MMX mmxext
> +cglobal sad8, 4, 4, 0, v, pix1, pix2, stride
> +    movu      m2, [pix2q]
> +    movu      m1, [pix2q+strideq]
> +    psadbw    m2, [pix1q]
> +    psadbw    m1, [pix1q+strideq]
> +    paddw     m2, m1
> +
> +%rep 3
> +    lea    pix1q, [pix1q+strideq*2]
> +    lea    pix2q, [pix2q+strideq*2]
> +    movu      m0, [pix2q]
> +    movu      m1, [pix2q+strideq]
> +    psadbw    m0, [pix1q]
> +    psadbw    m1, [pix1q+strideq]
> +    paddw     m2, m0
> +    paddw     m2, m1
> +%endrep
> +    movd     eax, m2
> +    RET
> +

Sorry to notice that now but... what happened to the h parameter?

[...]

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140917/22cb99dc/attachment.asc>


More information about the ffmpeg-devel mailing list