[MPlayer-dev-eng] [RFC] disable fastmemcpy on x86-64 by default
Attila Kinali
attila at kinali.ch
Sun May 27 22:47:55 CEST 2007
On Sun, 27 May 2007 18:19:48 +0200
Reimar D?ffinger <Reimar.Doeffinger at stud.uni-karlsruhe.de> wrote:
> Hello,
> since SSE is part of the x86-64 architecture, at least glibc makes use
> of it for its memcpy and some quick (and imprecise) tests indicate that
> it's at least not slower.
> So what do you think about attached patch? Can someone do more concise
> benchmarks?
Here some benchmarks:
System:
attila at jashugan:~ # uname -a
Linux jashugan 2.6.18 #1 Wed Sep 27 17:50:21 CEST 2006 x86_64 GNU/Linux
attila at jashugan:~ # cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 55
model name : AMD Athlon(tm) 64 Processor 3700+
stepping : 2
cpu MHz : 2202.856
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm
bogomips : 4409.53
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
attila at jashugan:~ # dpkg -s libc6|grep Version
Version: 2.3.6.ds1-13
attila at jashugan:~ # dpkg -s gcc|grep Version
Version: 4:4.1.1-15
attila at jashugan:~ # free -m
total used free shared buffers cached
Mem: 2012 1952 59 0 4 1125
-/+ buffers/cache: 822 1190
Swap: 7812 0 7812
Graphics card is a Matrox G550, used vo: xmga
All benchmarks are best of 3, with one burn in, run from a local
sata disk (resp after burn in from RAM)
standard parameters: -quiet -nosound -benchmark
-------------------------------------------------
Benchmark 1:
(frist 2000 frames only)
MPlayer dev-SVN-r23390-4.1.2 (C) 2000-2007 MPlayer Team
CPU: AMD Athlon(tm) 64 Processor 3700+ (Family: 15, Model: 55, Stepping: 2)
CPUflags: MMX: 1 MMX2: 1 3DNow: 1 3DNow2: 1 SSE: 1 SSE2: 1
Compiled for x86 CPU with extensions: MMX MMX2 3DNow 3DNowEx SSE SSE2
Playing /tmp/[Ayako]_Seto_no_Hanayome_-_01_(H264)_[951E16B9].mkv.
[mkv] Track ID 1: video (V_MPEG4/ISO/AVC), -vid 0
[mkv] Track ID 2: audio (A_AAC), -aid 0, -alang und
[mkv] Track ID 3: subtitles (S_TEXT/ASS), -sid 0, -slang und
[mkv] Track ID 4: subtitles (S_TEXT/UTF8), -sid 1, -slang und
[mkv] Will play video track 1.
[mkv] No audio track found/wanted.
Matroska file format detected.
VIDEO: [avc1] 1280x720 24bpp 23.976 fps 0.0 kbps ( 0.0 kbyte/s)
==========================================================================
Opening video decoder: [ffmpeg] FFmpeg's libavcodec codec family
Selected video codec: [ffh264] vfm: ffmpeg (FFmpeg H.264)
==========================================================================
Audio: no sound
Starting playback...
VDec: vo config request - 1280 x 720 (preferred colorspace: Planar YV12)
VDec: using Planar YV12 as output csp (no 0)
Movie-Aspect is 1.78:1 - prescaling to correct movie aspect.
VO: [xmga] 1280x720 => 1280x720 Planar YV12
[MGA] Using 3 buffers.
w/o patch:
BENCHMARKs: VC: 19.974s VO: 44.601s A: 0.000s Sys: 0.093s = 64.668s
BENCHMARK%: VC: 30.8864% VO: 68.9698% A: 0.0000% Sys: 0.1437% = 100.0000%
w/ patch:
BENCHMARKs: VC: 19.889s VO: 44.503s A: 0.000s Sys: 0.091s = 64.484s
BENCHMARK%: VC: 30.8437% VO: 69.0146% A: 0.0000% Sys: 0.1416% = 100.0000%
-------------------------------------------------
Benchmark 2:
MPlayer dev-SVN-r23390-4.1.2 (C) 2000-2007 MPlayer Team
CPU: AMD Athlon(tm) 64 Processor 3700+ (Family: 15, Model: 55, Stepping: 2)
CPUflags: MMX: 1 MMX2: 1 3DNow: 1 3DNow2: 1 SSE: 1 SSE2: 1
Compiled for x86 CPU with extensions: MMX MMX2 3DNow 3DNowEx SSE SSE2
Playing /tmp/Inuyasha Movie Commercial 01 Dvd Rip.avi.
AVI file format detected.
[aviheader] Video stream found, -vid 0
[aviheader] Audio stream found, -aid 1
VIDEO: [DIVX] 720x480 24bpp 23.976 fps 4038.1 kbps (492.9 kbyte/s)
Clip info:
Software: Nandub v1.0rc2
==========================================================================
Opening video decoder: [ffmpeg] FFmpeg's libavcodec codec family
Selected video codec: [ffodivx] vfm: ffmpeg (FFmpeg MPEG-4)
==========================================================================
Audio: no sound
Starting playback...
[mpeg4 @ 0xcedc20]looks like this file was encoded with (divx4/(old)xvid/opendivx) -> forcing low_delay flag
VDec: vo config request - 720 x 480 (preferred colorspace: Planar YV12)
VDec: using Planar YV12 as output csp (no 0)
Movie-Aspect is undefined - no prescaling applied.
VO: [xmga] 720x480 => 720x480 Planar YV12
[MGA] Using 3 buffers.
w/o patch:
BENCHMARKs: VC: 9.980s VO: 0.003s A: 0.000s Sys: 0.046s = 10.029s
BENCHMARK%: VC: 99.5084% VO: 0.0336% A: 0.0000% Sys: 0.4580% = 100.0000%
w/ patch:
BENCHMARKs: VC: 8.833s VO: 0.003s A: 0.000s Sys: 0.047s = 8.883s
BENCHMARK%: VC: 99.4307% VO: 0.0367% A: 0.0000% Sys: 0.5326% = 100.0000%
-------------------------------------------------
Benchmark 3:
MPlayer dev-SVN-r23390-4.1.2 (C) 2000-2007 MPlayer Team
CPU: AMD Athlon(tm) 64 Processor 3700+ (Family: 15, Model: 55, Stepping: 2)
CPUflags: MMX: 1 MMX2: 1 3DNow: 1 3DNow2: 1 SSE: 1 SSE2: 1
Compiled for x86 CPU with extensions: MMX MMX2 3DNow 3DNowEx SSE SSE2
Playing /tmp/[AnY-Spork] Iriya no Sora, UFO no Natsu - 1 [DVD-MP3][46B1F913].avi.
AVI file format detected.
[aviheader] Video stream found, -vid 0
[aviheader] Audio stream found, -aid 1
VIDEO: [XVID] 640x480 24bpp 23.976 fps 1064.2 kbps (129.9 kbyte/s)
Clip info:
Software: VirtualDubMod 1.5.10.1 (build 2366/release)
==========================================================================
Opening video decoder: [ffmpeg] FFmpeg's libavcodec codec family
Selected video codec: [ffodivx] vfm: ffmpeg (FFmpeg MPEG-4)
==========================================================================
Audio: no sound
Starting playback...
VDec: vo config request - 640 x 480 (preferred colorspace: Planar YV12)
VDec: using Planar YV12 as output csp (no 0)
Movie-Aspect is 1.33:1 - prescaling to correct movie aspect.
VO: [xmga] 640x480 => 640x480 Planar YV12
[MGA] Using 3 buffers.
w/o patch:
BENCHMARKs: VC: 307.732s VO: 0.126s A: 0.000s Sys: 1.025s = 308.883s
BENCHMARK%: VC: 99.6274% VO: 0.0407% A: 0.0000% Sys: 0.3319% = 100.0000%
w/ patch:
BENCHMARKs: VC: 307.750s VO: 0.112s A: 0.000s Sys: 1.093s = 308.954s
BENCHMARK%: VC: 99.6102% VO: 0.0361% A: 0.0000% Sys: 0.3537% = 100.0000%
-------------------------------------------------
Benchmark 4:
V for Vendetta DVD, coppied to disk, with -ss 4:00 -frames 4000
MPlayer dev-SVN-r23390-4.1.2 (C) 2000-2007 MPlayer Team
CPU: AMD Athlon(tm) 64 Processor 3700+ (Family: 15, Model: 55, Stepping: 2)
CPUflags: MMX: 1 MMX2: 1 3DNow: 1 3DNow2: 1 SSE: 1 SSE2: 1
Compiled for x86 CPU with extensions: MMX MMX2 3DNow 3DNowEx SSE SSE2
Playing dvd://1.
libdvdread: Couldn't find device name.
There are 2 titles on this DVD.
There are 12 chapters in this DVD title.
There are 1 angles in this DVD title.
audio stream: 0 format: ac3 (stereo) language: unknown aid: 128.
number of audio channels on disk: 1.
number of subtitles on disk: 0
MPEG-PS file format detected.
VIDEO: MPEG2 720x480 (aspect 3) 29.970 fps 0.0 kbps ( 0.0 kbyte/s)
==========================================================================
Opening video decoder: [mpegpes] MPEG 1/2 Video passthrough
VDec: vo config request - 720 x 480 (preferred colorspace: Mpeg PES)
Could not find matching colorspace - retrying with -vf scale...
Opening video filter: [scale]
The selected video_out device is incompatible with this codec.
Try appending the scale filter to your filter list,
e.g. -vf spp,scale instead of -vf spp.
VDecoder init failed :(
Opening video decoder: [libmpeg2] MPEG 1/2 Video decoder libmpeg2-v0.4.0b
Selected video codec: [mpeg12] vfm: libmpeg2 (MPEG-1 or 2 (libmpeg2))
==========================================================================
Audio: no sound
Starting playback...
VDec: vo config request - 720 x 480 (preferred colorspace: Planar YV12)
VDec: using Planar YV12 as output csp (no 0)
Movie-Aspect is 1.78:1 - prescaling to correct movie aspect.
VO: [xmga] 720x480 => 854x480 Planar YV12
[MGA] Using 3 buffers.
w/o patch:
BENCHMARKs: VC: 6.520s VO: 42.695s A: 0.000s Sys: 0.313s = 49.528s
BENCHMARK%: VC: 13.1638% VO: 86.2044% A: 0.0000% Sys: 0.6318% = 100.0000%
w/ patch:
BENCHMARKs: VC: 6.178s VO: 36.899s A: 0.000s Sys: 0.308s = 43.386s
BENCHMARK%: VC: 14.2394% VO: 85.0499% A: 0.0000% Sys: 0.7107% = 100.0000%
-------------------------------------------------
Benchmark 5:
(first 10000 frames only)
MPlayer dev-SVN-r23390-4.1.2 (C) 2000-2007 MPlayer Team
CPU: AMD Athlon(tm) 64 Processor 3700+ (Family: 15, Model: 55, Stepping: 2)
CPUflags: MMX: 1 MMX2: 1 3DNow: 1 3DNow2: 1 SSE: 1 SSE2: 1
Compiled for x86 CPU with extensions: MMX MMX2 3DNow 3DNowEx SSE SSE2
Playing /tmp/Hana Bi.avi.
AVI file format detected.
[aviheader] Video stream found, -vid 0
[aviheader] Audio stream found, -aid 1
VIDEO: [DIVX] 560x320 24bpp 23.976 fps 809.7 kbps (98.8 kbyte/s)
==========================================================================
Opening video decoder: [ffmpeg] FFmpeg's libavcodec codec family
Selected video codec: [ffodivx] vfm: ffmpeg (FFmpeg MPEG-4)
==========================================================================
Audio: no sound
Starting playback...
[mpeg4 @ 0xcedc20]looks like this file was encoded with (divx4/(old)xvid/opendivx) -> forcing low_delay flag
VDec: vo config request - 560 x 320 (preferred colorspace: Planar YV12)
VDec: using Planar YV12 as output csp (no 0)
Movie-Aspect is undefined - no prescaling applied.
VO: [xmga] 560x320 => 560x320 Planar YV12
[MGA] Using 3 buffers.
w/o patch:
BENCHMARKs: VC: 71.232s VO: 0.018s A: 0.000s Sys: 0.199s = 71.449s
BENCHMARK%: VC: 99.6961% VO: 0.0257% A: 0.0000% Sys: 0.2781% = 100.0000%
w/ patch:
BENCHMARKs: VC: 61.344s VO: 0.019s A: 0.000s Sys: 0.174s = 61.537s
BENCHMARK%: VC: 99.6857% VO: 0.0316% A: 0.0000% Sys: 0.2827% = 100.0000%
-------------------------------------------------
I also sinlge-run tested a few other samples similar to benchmark 1 and 3
(ie animes with divx3, divx4, xvid, h.264) codecs that didn't show any
siginificant speed difference (<1%)
Interesting are benchmark 2 and 5, which both are faster with
the patch. They are also the only ones i came across that
were decoded using the low_delay flag.
If someone is interested in this, i could search for more samples
of this kind, i should have some.
Attila Kinali
--
Linux ist... wenn man einfache Dinge auch mit einer kryptischen
post-fix Sprache loesen kann
-- Daniel Hottinger
More information about the MPlayer-dev-eng
mailing list