[MPlayer-dev-eng] [PATCH] remove mp3lib
Ivan Kalvachev
ikalvachev at gmail.com
Tue Sep 27 03:39:12 CEST 2011
On 9/24/11, Thomas Orgis <thomas-forum at orgis.org> wrote:
> Am Fri, 23 Sep 2011 16:24:20 +0200
> schrieb Thomas Orgis <thomas-forum at orgis.org>:
>
>> There have been posts to this list about this, but I don't think we
>> arrived at a final conclusion. Have to dig out the thread again... will
>> try to this evening.
>
> http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/2010-May/064656.html
>
> Around there seems to be the conclusion of the last discussion on the
> performance inconsistencies.
>
> Quoting Reimar:
>> > Not much, but I'd really start with cleaning up the stack issues
>> > with the 3dnow code, they might cause cache issue that can easily
>> > have that kind of effect.
>
> Well, he might have a point there. Meanwhile, I posted about the
> shiftily moving target that is the performance of mp3lib:
>
> http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/2010-May/064673.html
> and
> http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/2010-June/064726.html
>
> (I'd like to point out that it is a really, really, braindead idea to
> organize mailing list archives separated on monthly boundaries... you
> have a fun time stitching threads back together.)
Next time use gmane.org ;)
> I'm sorry that I did not push further on that topic, but I didn't (and
> still don't) have endless time to spare for such hunts. Actually, I was
> more busy hunting compiler bugs that sabotage my thesis work...
>
> Would be nice if we got this settled now for good.
It took me quite a lot of time, but I hope there will be useful
information in the data I provide.
If you need the raw perf.data files, I can send them to you.
First, few observations:
- demuxer lavf added 4 more seconds to all my test... Wonder why
everybody think lavf demuxers for slow ;)
- compiling specifically for my system lowered the mp3lib vs mpg123
speed difference from 8% to 6%.
- increasing alignment (of most big arrays) actually made mp3lib a
percent or 2 slower... so I dropped this road.
- nocache actually makes the test slower. it is still in the margin of
error, but...
athlon:$ mplayer -loop 3 -benchmark -ao pcm:fast:file=/dev/null -quiet
stream.dump
MPlayer SVN-r34123-4.5.3 (C) 2000-2011 MPlayer Team
Selected audio codec: [mpg123] afm: mpg123 (MPEG 1.0/2.0/2.5 layers I, II, III)
AO: [pcm] 48000Hz 2ch s16le (2 bytes per sample)
BENCHMARKs: VC: 0.000s VO: 0.000s A: 51.116s Sys: 0.511s = 51.628s
BENCHMARKs: VC: 0.000s VO: 0.000s A: 51.040s Sys: 0.530s = 51.569s
BENCHMARKs: VC: 0.000s VO: 0.000s A: 51.137s Sys: 0.520s = 51.657s
athlon$ mplayer -loop 3 -benchmark -ao pcm:fast:file=/dev/null -quiet
stream.dump -nocache
MPlayer SVN-r34123-4.5.3 (C) 2000-2011 MPlayer Team
Selected audio codec: [mpg123] afm: mpg123 (MPEG 1.0/2.0/2.5 layers I, II, III)
BENCHMARKs: VC: 0.000s VO: 0.000s A: 51.248s Sys: 0.508s = 51.756s
BENCHMARKs: VC: 0.000s VO: 0.000s A: 51.507s Sys: 0.551s = 52.058s
BENCHMARKs: VC: 0.000s VO: 0.000s A: 51.581s Sys: 0.528s = 52.109s
======================
In the following `perf stat` single runs there is something that could
easily be seen:
Cache misses are 2 times more for the mpg123 case and I think this is
what affects the instructions per cycle ratio. Have in mind that older
CPU actually have smaller cache, thus they are more affected by cache
misses.
======================
Performance counter stats for './mplayer -benchmark -ao
pcm:fast:file=/dev/null -quiet -nocache stream.dump -ac mpg123':
49288983690 cycles:u # 0.000 GHz
[60.00%]
54555534568 instructions:u # 1.11 insns per
cycle [60.00%]
30754388474 l1-dcache-loads:u
[60.00%]
104031324 l1-dcache-load-misses:u # 0.34% of all
L1-dcache hits [60.00%]
668863945 branch-misses
[60.00%]
53.125250828 seconds time elapsed
Performance counter stats for './mplayer -benchmark -ao
pcm:fast:file=/dev/null -quiet -nocache stream.dump -ac mp3':
47339857524 cycles:u # 0.000 GHz
[60.00%]
54276441494 instructions:u # 1.15 insns per
cycle [60.00%]
28642127395 l1-dcache-loads:u
[60.00%]
176993369 l1-dcache-load-misses:u # 0.62% of all
L1-dcache hits [60.00%]
665329666 branch-misses
[60.00%]
50.901825505 seconds time elapsed
==============
==============
athlon# perf record -e cycles:u -e instructions:u -e l1-dcache-loads:u
-e l1-dcache-load-misses:u -e branch-misses ./mplayer -loop 10
-benchmark -ao pcm:fast:file=/dev/null -quiet -nocache stream.dump -ac
mp3
athlon# perf report
# Events: 2K cycles
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. ............................
#
31.31% mplayer mplayer [.] III_dequantize_sample
18.69% mplayer mplayer [.] synth_1to1_MMX
15.99% mplayer mplayer [.] dct64_MMX_3dnowex
15.58% mplayer mplayer [.] dct36_3dnowex
8.13% mplayer mplayer [.] do_layer3.clone.3
1.62% mplayer libc-2.13.so [.] memcpy
1.49% mplayer ld-2.13.so [.] do_lookup_x
1.27% mplayer ld-2.13.so [.] check_match.8296
0.82% mplayer mplayer [.] fast_memcpy
0.64% mplayer mplayer [.] III_get_scale_factors_1
0.46% mplayer mplayer [.] synth_1to1
0.43% mplayer ld-2.13.so [.] strcmp
0.38% mplayer ld-2.13.so [.] .L180
0.37% mplayer mplayer [.] demux_audio_fill_buffer
0.27% mplayer mplayer [.] MP3_DecodeFrame
# Events: 2K instructions
#
# Overhead Command Shared Object Symbol
# ........ ....... ...................... ............................
#
26.46% mplayer mplayer [.] III_dequantize_sample
23.69% mplayer mplayer [.] synth_1to1_MMX
18.06% mplayer mplayer [.] dct64_MMX_3dnowex
15.58% mplayer mplayer [.] dct36_3dnowex
11.58% mplayer mplayer [.] do_layer3.clone.3
0.95% mplayer mplayer [.] synth_1to1
0.90% mplayer mplayer [.] III_get_scale_factors_1
0.44% mplayer ld-2.13.so [.] do_lookup_x
0.38% mplayer mplayer [.] dct12
0.24% mplayer ld-2.13.so [.] check_match.8296
0.19% mplayer mplayer [.] demux_audio_fill_buffer
0.17% mplayer ld-2.13.so [.] strcmp
0.15% mplayer mplayer [.] demux_mpg_fill_buffer
0.14% mplayer mplayer [.] nsv_check_file
0.09% mplayer mplayer [.] fast_memcpy
0.09% mplayer mplayer [.] MP3_DecodeFrame
0.09% mplayer libc-2.13.so [.] malloc_consolidate
# Events: 2K L1-dcache-loads
#
# Overhead Command Shared Object Symbol
# ........ ....... ...................... .......................
#
28.01% mplayer mplayer [.] synth_1to1_MMX
22.17% mplayer mplayer [.] dct64_MMX_3dnowex
16.62% mplayer mplayer [.] dct36_3dnowex
16.20% mplayer mplayer [.] III_dequantize_sample
9.84% mplayer mplayer [.] do_layer3.clone.3
2.24% mplayer libc-2.13.so [.] memcpy
1.33% mplayer mplayer [.] synth_1to1
0.57% mplayer ld-2.13.so [.] do_lookup_x
0.47% mplayer mplayer [.] III_get_scale_factors_1
0.38% mplayer mplayer [.] fast_memcpy
0.35% mplayer ld-2.13.so [.] check_match.8296
0.24% mplayer libc-2.13.so [.] _int_malloc
0.24% mplayer mplayer [.] dct12
0.15% mplayer ld-2.13.so [.] strcmp
0.14% mplayer libc-2.13.so [.] __cfree
0.14% mplayer mplayer [.] MP3_DecodeFrame
0.14% mplayer mplayer [.] demux_read_data
# Events: 1K L1-dcache-load-misses
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. .........................
#
24.61% mplayer libc-2.13.so [.] memcpy
15.37% mplayer mplayer [.] synth_1to1_MMX
9.81% mplayer mplayer [.] fast_memcpy
9.44% mplayer mplayer [.] dct36_3dnowex
8.47% mplayer mplayer [.] dct64_MMX_3dnowex
7.20% mplayer mplayer [.] III_dequantize_sample
5.34% mplayer ld-2.13.so [.] do_lookup_x
4.50% mplayer ld-2.13.so [.] check_match.8296
3.83% mplayer mplayer [.] do_layer3.clone.3
1.13% mplayer libc-2.13.so [.] __GI___mempcpy
0.93% mplayer ld-2.13.so [.] .L180
0.90% mplayer ld-2.13.so [.] strcmp
0.67% mplayer mplayer [.] synth_1to1
0.64% mplayer libc-2.13.so [.] __GI_memmove
0.58% mplayer libc-2.13.so [.] _int_malloc
0.52% mplayer libc-2.13.so [.] __malloc
0.48% mplayer mplayer [.] demux_audio_fill_buffer
0.48% mplayer [kernel.kallsyms] [k] page_fault
0.43% mplayer mplayer [.] demux_read_data
# Events: 2K branch-misses
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. ...........................
#
79.96% mplayer mplayer [.] III_dequantize_sample
4.82% mplayer mplayer [.] synth_1to1_MMX
3.38% mplayer mplayer [.] do_layer3.clone.3
2.59% mplayer mplayer [.] dct64_MMX_3dnowex
0.41% mplayer ld-2.13.so [.] do_lookup_x
0.41% mplayer [kernel.kallsyms] [k] __lock_acquire
0.38% mplayer mplayer [.] fast_memcpy
0.38% mplayer mplayer [.] ds_fill_buffer
0.33% mplayer [kernel.kallsyms] [k] lock_acquire
0.33% mplayer mplayer [.] dct36_3dnowex
0.33% mplayer mplayer [.] III_get_scale_factors_1
0.32% mplayer mplayer [.] demux_audio_fill_buffer
athlon# perf record -e cycles:u -e instructions:u -e l1-dcache-loads:u
-e l1-dcache-load-misses:u -e branch-misses ./mplayer -loop 10
-benchmark -ao pcm:fast:file=/dev/null -quiet -nocache stream.dump -ac
mpg123
athlon# perf report
# Events: 3K cycles
#
# Overhead Command Shared Object Symbol
# ........ ....... ................... ................................
#
31.04% mplayer libmpg123.so.0.29.6 [.] III_dequantize_sample
15.34% mplayer libmpg123.so.0.29.6 [.] INT123_dct64_3dnowext
14.67% mplayer libmpg123.so.0.29.6 [.] INT123_dct36_3dnowext
11.26% mplayer libmpg123.so.0.29.6 [.] synth_1to1_3dnowext_asm
7.42% mplayer libmpg123.so.0.29.6 [.] INT123_do_layer3
7.15% mplayer libmpg123.so.0.29.6 [.] .next_loop
3.10% mplayer libc-2.13.so [.] memcpy
1.39% mplayer libmpg123.so.0.29.6 [.] INT123_synth_1to1_3dnowext
1.15% mplayer libmpg123.so.0.29.6 [.] III_get_scale_factors_1.clone.1
1.04% mplayer ld-2.13.so [.] do_lookup_x
0.77% mplayer ld-2.13.so [.] check_match.8296
0.49% mplayer libmpg123.so.0.29.6 [.] dct12
# Events: 3K instructions
#
# Overhead Command Shared Object Symbol
# ........ ....... ...................... ...............................
#
23.14% mplayer libmpg123.so.0.29.6 [.] III_dequantize_sample
19.15% mplayer libmpg123.so.0.29.6 [.] INT123_dct64_3dnowext
15.96% mplayer libmpg123.so.0.29.6 [.] INT123_dct36_3dnowext
13.29% mplayer libmpg123.so.0.29.6 [.] synth_1to1_3dnowext_asm
11.04% mplayer libmpg123.so.0.29.6 [.] .next_loop
10.96% mplayer libmpg123.so.0.29.6 [.] INT123_do_layer3
1.88% mplayer libmpg123.so.0.29.6 [.] INT123_synth_1to1_3dnowext
1.11% mplayer libmpg123.so.0.29.6 [.] III_get_scale_factors_1.clone.1
0.40% mplayer libmpg123.so.0.29.6 [.] dct12
0.33% mplayer ld-2.13.so [.] do_lookup_x
0.26% mplayer libc-2.13.so [.] _int_malloc
# Events: 3K L1-dcache-loads
#
# Overhead Command Shared Object Symbol
# ........ ....... ................... ...............................
#
21.63% mplayer libmpg123.so.0.29.6 [.] INT123_dct64_3dnowext
15.47% mplayer libmpg123.so.0.29.6 [.] synth_1to1_3dnowext_asm
15.28% mplayer libmpg123.so.0.29.6 [.] III_dequantize_sample
14.05% mplayer libmpg123.so.0.29.6 [.] INT123_dct36_3dnowext
11.40% mplayer libmpg123.so.0.29.6 [.] .next_loop
11.07% mplayer libmpg123.so.0.29.6 [.] INT123_do_layer3
4.02% mplayer libc-2.13.so [.] memcpy
2.06% mplayer libmpg123.so.0.29.6 [.] INT123_synth_1to1_3dnowext
1.20% mplayer libmpg123.so.0.29.6 [.] III_get_scale_factors_1.clone.1
0.37% mplayer libmpg123.so.0.29.6 [.] dct12
0.31% mplayer ld-2.13.so [.] do_lookup_x
# Events: 2K L1-dcache-load-misses
#
# Overhead Command Shared Object Symbol
# ........ ....... ................... ...............................
#
38.99% mplayer libc-2.13.so [.] memcpy
11.54% mplayer libmpg123.so.0.29.6 [.] III_dequantize_sample
11.00% mplayer libmpg123.so.0.29.6 [.] INT123_dct36_3dnowext
8.19% mplayer libmpg123.so.0.29.6 [.] INT123_do_layer3
5.05% mplayer ld-2.13.so [.] do_lookup_x
4.48% mplayer libmpg123.so.0.29.6 [.] synth_1to1_3dnowext_asm
4.12% mplayer libmpg123.so.0.29.6 [.] .next_loop
2.49% mplayer libmpg123.so.0.29.6 [.] INT123_dct64_3dnowext
2.32% mplayer ld-2.13.so [.] check_match.8296
1.48% mplayer libc-2.13.so [.] __GI___mempcpy
0.81% mplayer libc-2.13.so [.] _int_malloc
0.80% mplayer mplayer [.] demux_audio_fill_buffer
0.73% mplayer ld-2.13.so [.] strcmp
0.69% mplayer libc-2.13.so [.] __cfree
0.64% mplayer ld-2.13.so [.] .L180
0.46% mplayer libmpg123.so.0.29.6 [.] buffered_forget
0.42% mplayer mplayer [.] ds_fill_buffer
# Events: 3K branch-misses
#
# Overhead Command Shared Object Symbol
# ........ ....... ................... ...............................
#
80.36% mplayer libmpg123.so.0.29.6 [.] III_dequantize_sample
3.09% mplayer libmpg123.so.0.29.6 [.] synth_1to1_3dnowext_asm
2.24% mplayer libmpg123.so.0.29.6 [.] INT123_do_layer3
1.11% mplayer libmpg123.so.0.29.6 [.] INT123_dct64_3dnowext
1.00% mplayer libmpg123.so.0.29.6 [.] synth_stereo_wrap
0.67% mplayer libmpg123.so.0.29.6 [.] .next_loop
0.64% mplayer libc-2.13.so [.] memcpy
0.64% mplayer [kernel.kallsyms] [k] __lock_acquire
0.61% mplayer libmpg123.so.0.29.6 [.] INT123_dct36_3dnowext
0.50% mplayer libmpg123.so.0.29.6 [.] mpg123_decode
0.46% mplayer libmpg123.so.0.29.6 [.] INT123_synth_1to1_3dnowext
==========
Strange things I noticed in the above reports:
- I see ld.so "do_lookup_x" take noticeable amount of time. To
benchmark less of startup and more the runtime execution, at first I
added "-loop 5". It however had way smaller impact than I expected,
aka something like from 1.6% to 1.2%. It's quite strange why something
that (probably) looks up symbols, is so persistent in runtime
benchmarks. Dalias didn't had explanation, other than me having
LD_PRELOAD nasty (I didn't see any).
(the function shows in both mp3lib and mpg123).
-there is a memcpy in glibc that takes most of the cache misses. I
wonder how I can make a nice percentage table of the functions that
call this memcpy().
Conclusion:
The speed slowdown may be caused by something as simple as moving
variables and buffers around.
III_dequantize_sample() seems like the function that needs most love.
And it seems to be the core function. It have a lot of cache misses
and a lot of branch misses. All things that older cpu are quite bad at
;)
Best Regards.
More information about the MPlayer-dev-eng
mailing list