[Ffmpeg-devel] [PATCH] MMX optimixation for get_amv() in libavcodec/h263.c

Michael Niedermayer michaelni
Thu Apr 19 22:09:45 CEST 2007


Hi

On Thu, Apr 19, 2007 at 05:26:45PM +0400, Andrew Savchenko wrote:
> Hello,
> 
> I optimized one FIXME in h263.c in get_amv().
> Unfortunately, I failed to find or create video material where this 
> function is used during decoding, so syntetic tests was used. If 

xvid and the reference mpeg4 encoder should support global motion estimation
(its a useless feature quality wise but people occasionally use it none
the less)


> someone can provide me a link to such video or point me a way to 
> create such video, it'll be great.
> 
> Changes that was made for syntetic test benchmarks are in 
> h263_syntetic.diff.
> 
> First patch (h263_mmx_16bit.diff) use 16 bits for sum "variables", 
> thus operations such as shifts and summation can be perfomed on 4 
> values by single instruction. But I'm afraid that in real decoding 
> sum value may be overflow. So I made the second patch 
> (h263_mmx_32bit.diff) to eliminate this problem. Obviously it is 
> slower, because MMX instructions can take only 2 32-bit values at 
> time.
> 
> Testing was done on AthlonXP. Internal loop in 1st patch is 
> totally unrolled, because this provide the best perfomance in 
> comparision to untouched and partially unrolled loop (probably due 
> to better pipeline utilization). 
> 
> In the 2nd patch internal loop is unrolled only partially, futher 
> unrolling brings no additional perfomance within measurement 
> errors. Also %%eax was used for multiplication, because MMX can 
> multiply only 16-bit values and can't unpack *signed* value from 
> word to double word.
> 
> There is benchmark results summary, oprofile was used as profiler:
> ========= mean value =========== standard deviation ===========
> C:               38591                                322
> mmx_16:    5790                                   38
> mmx_32:    10836                                 66
> 
> So, if sum is known to fit in 16 bit (indeed it can be slightly 
> larger, up to 17 bits, but it is hard to set exact treshold), 1st 
> patch is highly preffered.

the 16 bit code with a simple check if the values would fit and
a fallback to the c version could be done


> 
> P.S. While not related to the pacth, I like to ask some 
> development-related questions.
> 
> Can someone point me to SSE instruction set guide from AMD? Is this 
> one ever exists? I'm not sure that intel's descriptions and  
> perfomance recomendations for SSE are appliable for AMD 
> processors. Now I have only guides for mmx, 3dnow!, mmext/3dnowext 
> instructions sets from AMD and optimization guide for Athlon (pub. 
> 20726, 21928, 22466 and 22007 respectively).

try http://www.agner.org/optimize/
and see doc/optimization.txt in ffmpeg svn


> 
> Is there any convenient way to debug asm inlines using gdb or so 
> on? Is it possible to step asm instructions, examine registers and 
> so on?

yes gdb can do this IIRC



> --- mplayer/libavcodec/h263.c.orig	2007-04-10 11:06:58.000000000 +0400
> +++ mplayer/libavcodec/h263.c	2007-04-18 23:11:41.000000000 +0400
> @@ -4231,6 +4231,10 @@
>      static int8_t quant_tab[4] = { -1, -2, 1, 2 };
>      const int xy= s->mb_x + s->mb_y * s->mb_stride;
>  
> +    int volatile MX, MY;
> +    MX = get_amv(s, 0);
> +    MY = get_amv(s, 1);

why volatile?


[...]
>  static inline int get_amv(MpegEncContext *s, int n){
> +#ifndef HAVE_MMX
>      int x, y, mb_v, sum, dx, dy, shift;
> +#else /* HAVE_MMX */
> +    int mb_v, sum, dx, dy, shift;
> +#endif /* HAVE_MMX */

MMX specific code should be in libavcodec/i386/...


[...]
> +        asm volatile(
> +        "pxor       %%mm5, %%mm5 \n"  //sum=0
> +        "movd       %[st], %%mm3 \n"  //shift
                       ^^^^
not gcc 2.95 compatible


[...]

> +        "packssdw   %%mm1, %%mm1 \n"  //0 0 0 dx

not needed?

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070419/01d86b62/attachment.pgp>



More information about the ffmpeg-devel mailing list