[Ffmpeg-devel] [PATCH] SSE2 implementation for motion_est_mmx.c

Michael Niedermayer michaelni
Mon Jun 19 23:10:28 CEST 2006


Hi

On Tue, Jun 20, 2006 at 12:44:25AM +0800, jserv at linux2.cc.ntu.edu.tw wrote:
> Hello list,
> 
>   I have implemented SSE2 implementation of motion_est_mmx.c, and it
> works slightly faster than the original MMX2 implementation. Currently,
> it depends on MMX intrinsics, so that HAVE_BUILTIN_VECTOR is required,
> which means you have to build the code with GCC and Intel C++ compiler.

your code has little in common with what the functions do which you replaced,
or in other words it cannot work, if its faster or not is consequentially
irrelevant


[...]

>   *
>   * This library is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU Lesser General Public
> @@ -22,6 +23,29 @@
>  #include "../dsputil.h"
>  #include "mmx.h"
>  
> +#if defined(HAVE_BUILTIN_VECTOR)

i would prefer if intrinsic code could be kept in seperate files, away from
non intrinsics code, this makes conditional compilation easier and cleaner


> +#include <inttypes.h>
> +#ifdef __GNUC__
> +  #ifndef __forceinline
> +    #define __forceinline __attribute__((__always_inline__)) inline

you must not define anything beginning with __, these symbols are reserved
in C
furthermore we have always_inline which does already what you do above


[...]

> +static void __attribute__((constructor)) mpeg2_MC_sse_ctor()
> +{
> +    const_1_16_bytes = _mm_set1_epi16(1);
> +}

code like this is not accpetable, it will probably fail if SSE2 is not
supported, and even then it probably will cause problems with some more
obscure systems


[...]

> +static void sad16_sse2(uint8_t* dest, const uint8_t* ref, const int stride, int height)
> +{
> +	const uint8_t *edx = ref;
> +	uint8_t *ecx = dest;
> +	int esi = height;
> +	int eax = stride;
> +	int edi = eax + eax;

if you use "random" identifers like the register names then you could just
write the whole as an asm() block and avoid the troubble, portability and
performance issues on various compilers with intrinsics
the main 2 advantages of intrinsics over asm() code is that it allows writing
code which is more readable and that it allows the compiler to reorder the
instructions, gcc isnt overly good at the later though i heared it has been
improved ... and with names like that you completely defeat the first

btw tabs are not allowed in ffmpeg, please use spaces instead


> +	for (; esi; edx += edi, ecx += edi,esi -= 2) {
> +		__m128i xmm0, xmm1, xmm2, xmm3;
> +		xmm0 = _mm_loadu_si128((__m128i*) edx);
> +		xmm1 = _mm_loadu_si128((__m128i*) (edx + eax));
> +		xmm2 = _mm_load_si128((__m128i*) ecx);
> +		xmm3 = _mm_load_si128((__m128i*) (ecx + eax));
> +		xmm0 = _mm_avg_epu8(xmm0, xmm2);
> +		xmm1 = _mm_avg_epu8(xmm1, xmm3);
> +		_mm_store_si128((__m128i*) ecx, xmm0);
> +		_mm_store_si128((__m128i*) (ecx + eax), xmm1);
> +	}

as already said at the top, your code is not doing what the functions are
supposed to do, your code in this case simply averages 2 16xY blocks while
sad16* is supposed to calculate the sum of absolute differences of 2 blocks


[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is




More information about the ffmpeg-devel mailing list