[MPlayer-dev-eng] [PATCH] SSE2-optimized libmpeg2 motion compensation
Guillaume POIRIER
poirierg at gmail.com
Wed Jun 14 13:48:30 CEST 2006
Hi,
On 6/14/06, jserv at linux2.cc.ntu.edu.tw <jserv at linux2.cc.ntu.edu.tw> wrote:
> Hello list,
>
> Recently, I implement SSE2-optimized libmpeg2 motion compensation, and
> I think that it might be useful to MPlayer. I have attached the patch in
> this mail.
Quick feedback: since you are using intrinsics, you should make your
code depend on the availability of mmintrin.h, xmmintrin.h, and
emmintrin.h
>From the top of my head, I don't know if existing configure checks for
this though... but it's rather easy to add these checks (just look at
the way it's done for 3dnow).
> The performance gain over the original MMXext(MMX2)-based motion
> compensation implementation in libmpeg2 is as the following:
>
> Sample
> ======
> MPEG-PS file format detected.
> VIDEO: MPEG2 720x480 (aspect 2) 29.970 fps 2376.0 kbps (297.0 kbyte/s)
>
> MMXext-based
> ============
> time seconds seconds calls ms/call ms/call name
> 15.58 12.83 12.83 3013243 0.00 0.00 mmxext_idct
> 14.70 24.94 12.11 515616 0.02 0.02 MC_put_xy_16_mmxext
> 12.40 35.15 10.21 56139 0.18 0.18 fast_memcpy
> 6.69 40.66 5.51 1826892 0.00 0.00 slice_intra_DCT
> 6.34 45.88 5.22 1500758 0.00 0.00 MC_put_o_8_mmxext
> 5.44 50.36 4.48 1758006 0.00 0.00 get_non_intra_block
> 3.95 53.61 3.25 30480 0.11 2.16 mpeg2_slice
> 3.28 56.31 2.70 1069690 0.00 0.03 motion_fr_frame_420
> 3.21 58.95 2.64 112816 0.02 0.02 MC_avg_xy_16_mmxext
> 3.14 61.54 2.59 1826892 0.00 0.01 mpeg2_idct_copy_mmxext
>
> SSE2-based
> ==========
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 16.32 13.24 13.24 3013243 0.00 0.00 mmxext_idct
> 12.43 23.33 10.09 56139 0.18 0.18 fast_memcpy
> 10.67 31.99 8.66 515616 0.02 0.02 MC_put_xy_16_sse2
> 7.16 37.80 5.81 1826892 0.00 0.00 slice_intra_DCT
> 6.85 43.36 5.56 1500758 0.00 0.00 MC_put_o_8_sse2
> 5.73 48.01 4.65 1758006 0.00 0.00 get_non_intra_block
> 4.97 52.04 4.03 30480 0.13 2.10 mpeg2_slice
> 3.27 54.69 2.65 1069690 0.00 0.03 motion_fr_frame_420
> 3.24 57.32 2.63 205987 0.01 0.01 MC_put_x_16_sse2
> 3.11 59.84 2.52 1826892 0.00 0.01 mpeg2_idct_copy_mmxext
> 2.93 62.22 2.38 112816 0.02 0.02 MC_avg_xy_16_sse2
What CPU did you run your tests on? What compiler did you use?
Did you try to see if your code was compiling ok with icc (intel's compiler)?
Guillaume
--
"Success consists of going from failure to failure without loss of enthusiasm."
-- Winston Churchill
More information about the MPlayer-dev-eng
mailing list