[MPlayer-dev-eng] [PATCH] SSE2-optimized libmpeg2 motion compensation

Guillaume POIRIER poirierg at gmail.com
Wed Jun 14 13:48:30 CEST 2006


Hi,

On 6/14/06, jserv at linux2.cc.ntu.edu.tw <jserv at linux2.cc.ntu.edu.tw> wrote:
> Hello list,
>
>   Recently, I implement SSE2-optimized libmpeg2 motion compensation, and
> I think that it might be useful to MPlayer. I have attached the patch in
> this mail.

Quick feedback: since you are using intrinsics, you should make your
code depend on the availability of mmintrin.h, xmmintrin.h, and
emmintrin.h

>From the top of my head, I don't know if existing configure checks for
this though... but it's rather easy to add these checks (just look at
the way it's done for 3dnow).


>   The performance gain over the original MMXext(MMX2)-based motion
> compensation implementation in libmpeg2 is as the following:
>
> Sample
> ======
> MPEG-PS file format detected.
> VIDEO:  MPEG2  720x480  (aspect 2)  29.970 fps  2376.0 kbps (297.0 kbyte/s)
>
> MMXext-based
> ============
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  15.58     12.83    12.83  3013243     0.00     0.00  mmxext_idct
>  14.70     24.94    12.11   515616     0.02     0.02  MC_put_xy_16_mmxext
>  12.40     35.15    10.21    56139     0.18     0.18  fast_memcpy
>   6.69     40.66     5.51  1826892     0.00     0.00  slice_intra_DCT
>   6.34     45.88     5.22  1500758     0.00     0.00  MC_put_o_8_mmxext
>   5.44     50.36     4.48  1758006     0.00     0.00  get_non_intra_block
>   3.95     53.61     3.25    30480     0.11     2.16  mpeg2_slice
>   3.28     56.31     2.70  1069690     0.00     0.03  motion_fr_frame_420
>   3.21     58.95     2.64   112816     0.02     0.02  MC_avg_xy_16_mmxext
>   3.14     61.54     2.59  1826892     0.00     0.01  mpeg2_idct_copy_mmxext
>
> SSE2-based
> ==========
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  16.32     13.24    13.24  3013243     0.00     0.00  mmxext_idct
>  12.43     23.33    10.09    56139     0.18     0.18  fast_memcpy
>  10.67     31.99     8.66   515616     0.02     0.02  MC_put_xy_16_sse2
>   7.16     37.80     5.81  1826892     0.00     0.00  slice_intra_DCT
>   6.85     43.36     5.56  1500758     0.00     0.00  MC_put_o_8_sse2
>   5.73     48.01     4.65  1758006     0.00     0.00  get_non_intra_block
>   4.97     52.04     4.03    30480     0.13     2.10  mpeg2_slice
>   3.27     54.69     2.65  1069690     0.00     0.03  motion_fr_frame_420
>   3.24     57.32     2.63   205987     0.01     0.01  MC_put_x_16_sse2
>   3.11     59.84     2.52  1826892     0.00     0.01  mpeg2_idct_copy_mmxext
>   2.93     62.22     2.38   112816     0.02     0.02  MC_avg_xy_16_sse2

What CPU did you run your tests on? What compiler did you use?
Did you try to see if your code was compiling ok with icc (intel's compiler)?

Guillaume

-- 
"Success consists of going from failure to failure without loss of enthusiasm."
 -- Winston Churchill



More information about the MPlayer-dev-eng mailing list