[MPlayer-dev-eng] [PATCH] remove mp3lib

Ivan Kalvachev ikalvachev at gmail.com
Tue Sep 27 03:39:12 CEST 2011


On 9/24/11, Thomas Orgis <thomas-forum at orgis.org> wrote:
> Am Fri, 23 Sep 2011 16:24:20 +0200
> schrieb Thomas Orgis <thomas-forum at orgis.org>:
>
>> There have been posts to this list about this, but I don't think we
>> arrived at a final conclusion. Have to dig out the thread again... will
>> try to this evening.
>
> http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/2010-May/064656.html
>
> Around there seems to be the conclusion of the last discussion on the
> performance inconsistencies.
>
> Quoting Reimar:
>> > Not much, but I'd really start with cleaning up the stack issues
>> > with the 3dnow code, they might cause cache issue that can easily
>> > have that kind of effect.
>
> Well, he might have a point there. Meanwhile, I posted about the
> shiftily moving target that is the performance of mp3lib:
>
> http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/2010-May/064673.html
> and
> http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/2010-June/064726.html
>
> (I'd like to point out that it is a really, really, braindead idea to
> organize mailing list archives separated on monthly boundaries... you
> have a fun time stitching threads back together.)

Next time use gmane.org ;)

> I'm sorry that I did not push further on that topic, but I didn't (and
> still don't) have endless time to spare for such hunts. Actually, I was
> more busy hunting compiler bugs that sabotage my thesis work...
>
> Would be nice if we got this settled now for good.

It took me quite a lot of time, but I hope there will be useful
information in the data I provide.
If you need the raw perf.data files, I can send them to you.

First, few observations:

- demuxer lavf added 4 more seconds to all my test... Wonder why
everybody think lavf demuxers for slow ;)

- compiling specifically for my system lowered the mp3lib vs mpg123
speed difference from 8% to 6%.

- increasing alignment (of most big arrays) actually made mp3lib a
percent or 2 slower... so I dropped this road.

- nocache actually makes the test slower. it is still in the margin of
error, but...

athlon:$ mplayer -loop 3 -benchmark -ao pcm:fast:file=/dev/null -quiet
stream.dump
MPlayer SVN-r34123-4.5.3 (C) 2000-2011 MPlayer Team
Selected audio codec: [mpg123] afm: mpg123 (MPEG 1.0/2.0/2.5 layers I, II, III)
AO: [pcm] 48000Hz 2ch s16le (2 bytes per sample)
BENCHMARKs: VC:   0.000s VO:   0.000s A:  51.116s Sys:   0.511s =   51.628s
BENCHMARKs: VC:   0.000s VO:   0.000s A:  51.040s Sys:   0.530s =   51.569s
BENCHMARKs: VC:   0.000s VO:   0.000s A:  51.137s Sys:   0.520s =   51.657s

athlon$ mplayer -loop 3 -benchmark -ao pcm:fast:file=/dev/null -quiet
stream.dump -nocache
MPlayer SVN-r34123-4.5.3 (C) 2000-2011 MPlayer Team
Selected audio codec: [mpg123] afm: mpg123 (MPEG 1.0/2.0/2.5 layers I, II, III)
BENCHMARKs: VC:   0.000s VO:   0.000s A:  51.248s Sys:   0.508s =   51.756s
BENCHMARKs: VC:   0.000s VO:   0.000s A:  51.507s Sys:   0.551s =   52.058s
BENCHMARKs: VC:   0.000s VO:   0.000s A:  51.581s Sys:   0.528s =   52.109s

======================

In the following `perf stat` single runs there is something that could
easily be seen:
Cache misses are 2 times more for the mpg123 case and I think this is
what affects the instructions per cycle ratio. Have in mind that older
CPU actually have smaller cache, thus they are more affected by cache
misses.
======================
 Performance counter stats for './mplayer -benchmark -ao
pcm:fast:file=/dev/null -quiet -nocache stream.dump -ac mpg123':

       49288983690 cycles:u                  #    0.000 GHz
         [60.00%]
       54555534568 instructions:u            #    1.11  insns per
cycle         [60.00%]
       30754388474 l1-dcache-loads:u
         [60.00%]
         104031324 l1-dcache-load-misses:u   #    0.34% of all
L1-dcache hits   [60.00%]
         668863945 branch-misses
         [60.00%]

      53.125250828 seconds time elapsed

 Performance counter stats for './mplayer -benchmark -ao
pcm:fast:file=/dev/null -quiet -nocache stream.dump -ac mp3':

       47339857524 cycles:u                  #    0.000 GHz
         [60.00%]
       54276441494 instructions:u            #    1.15  insns per
cycle         [60.00%]
       28642127395 l1-dcache-loads:u
         [60.00%]
         176993369 l1-dcache-load-misses:u   #    0.62% of all
L1-dcache hits   [60.00%]
         665329666 branch-misses
         [60.00%]

      50.901825505 seconds time elapsed

==============
==============
athlon# perf record -e cycles:u -e instructions:u -e l1-dcache-loads:u
-e l1-dcache-load-misses:u -e branch-misses ./mplayer -loop 10
-benchmark -ao pcm:fast:file=/dev/null -quiet -nocache stream.dump -ac
mp3
athlon# perf report

# Events: 2K cycles
#
# Overhead  Command      Shared Object                        Symbol
# ........  .......  .................  ............................
#
    31.31%  mplayer  mplayer            [.] III_dequantize_sample
    18.69%  mplayer  mplayer            [.] synth_1to1_MMX
    15.99%  mplayer  mplayer            [.] dct64_MMX_3dnowex
    15.58%  mplayer  mplayer            [.] dct36_3dnowex
     8.13%  mplayer  mplayer            [.] do_layer3.clone.3
     1.62%  mplayer  libc-2.13.so       [.] memcpy
     1.49%  mplayer  ld-2.13.so         [.] do_lookup_x
     1.27%  mplayer  ld-2.13.so         [.] check_match.8296
     0.82%  mplayer  mplayer            [.] fast_memcpy
     0.64%  mplayer  mplayer            [.] III_get_scale_factors_1
     0.46%  mplayer  mplayer            [.] synth_1to1
     0.43%  mplayer  ld-2.13.so         [.] strcmp
     0.38%  mplayer  ld-2.13.so         [.] .L180
     0.37%  mplayer  mplayer            [.] demux_audio_fill_buffer
     0.27%  mplayer  mplayer            [.] MP3_DecodeFrame


# Events: 2K instructions
#
# Overhead  Command           Shared Object                        Symbol
# ........  .......  ......................  ............................
#
    26.46%  mplayer  mplayer                 [.] III_dequantize_sample
    23.69%  mplayer  mplayer                 [.] synth_1to1_MMX
    18.06%  mplayer  mplayer                 [.] dct64_MMX_3dnowex
    15.58%  mplayer  mplayer                 [.] dct36_3dnowex
    11.58%  mplayer  mplayer                 [.] do_layer3.clone.3
     0.95%  mplayer  mplayer                 [.] synth_1to1
     0.90%  mplayer  mplayer                 [.] III_get_scale_factors_1
     0.44%  mplayer  ld-2.13.so              [.] do_lookup_x
     0.38%  mplayer  mplayer                 [.] dct12
     0.24%  mplayer  ld-2.13.so              [.] check_match.8296
     0.19%  mplayer  mplayer                 [.] demux_audio_fill_buffer
     0.17%  mplayer  ld-2.13.so              [.] strcmp
     0.15%  mplayer  mplayer                 [.] demux_mpg_fill_buffer
     0.14%  mplayer  mplayer                 [.] nsv_check_file
     0.09%  mplayer  mplayer                 [.] fast_memcpy
     0.09%  mplayer  mplayer                 [.] MP3_DecodeFrame
     0.09%  mplayer  libc-2.13.so            [.] malloc_consolidate

# Events: 2K L1-dcache-loads
#
# Overhead  Command           Shared Object                   Symbol
# ........  .......  ......................  .......................
#
    28.01%  mplayer  mplayer                 [.] synth_1to1_MMX
    22.17%  mplayer  mplayer                 [.] dct64_MMX_3dnowex
    16.62%  mplayer  mplayer                 [.] dct36_3dnowex
    16.20%  mplayer  mplayer                 [.] III_dequantize_sample
     9.84%  mplayer  mplayer                 [.] do_layer3.clone.3
     2.24%  mplayer  libc-2.13.so            [.] memcpy
     1.33%  mplayer  mplayer                 [.] synth_1to1
     0.57%  mplayer  ld-2.13.so              [.] do_lookup_x
     0.47%  mplayer  mplayer                 [.] III_get_scale_factors_1
     0.38%  mplayer  mplayer                 [.] fast_memcpy
     0.35%  mplayer  ld-2.13.so              [.] check_match.8296
     0.24%  mplayer  libc-2.13.so            [.] _int_malloc
     0.24%  mplayer  mplayer                 [.] dct12
     0.15%  mplayer  ld-2.13.so              [.] strcmp
     0.14%  mplayer  libc-2.13.so            [.] __cfree
     0.14%  mplayer  mplayer                 [.] MP3_DecodeFrame
     0.14%  mplayer  mplayer                 [.] demux_read_data

# Events: 1K L1-dcache-load-misses
#
# Overhead  Command      Shared Object                     Symbol
# ........  .......  .................  .........................
#
    24.61%  mplayer  libc-2.13.so       [.] memcpy
    15.37%  mplayer  mplayer            [.] synth_1to1_MMX
     9.81%  mplayer  mplayer            [.] fast_memcpy
     9.44%  mplayer  mplayer            [.] dct36_3dnowex
     8.47%  mplayer  mplayer            [.] dct64_MMX_3dnowex
     7.20%  mplayer  mplayer            [.] III_dequantize_sample
     5.34%  mplayer  ld-2.13.so         [.] do_lookup_x
     4.50%  mplayer  ld-2.13.so         [.] check_match.8296
     3.83%  mplayer  mplayer            [.] do_layer3.clone.3
     1.13%  mplayer  libc-2.13.so       [.] __GI___mempcpy
     0.93%  mplayer  ld-2.13.so         [.] .L180
     0.90%  mplayer  ld-2.13.so         [.] strcmp
     0.67%  mplayer  mplayer            [.] synth_1to1
     0.64%  mplayer  libc-2.13.so       [.] __GI_memmove
     0.58%  mplayer  libc-2.13.so       [.] _int_malloc
     0.52%  mplayer  libc-2.13.so       [.] __malloc
     0.48%  mplayer  mplayer            [.] demux_audio_fill_buffer
     0.48%  mplayer  [kernel.kallsyms]  [k] page_fault
     0.43%  mplayer  mplayer            [.] demux_read_data


# Events: 2K branch-misses
#
# Overhead  Command      Shared Object                       Symbol
# ........  .......  .................  ...........................
#
    79.96%  mplayer  mplayer            [.] III_dequantize_sample
     4.82%  mplayer  mplayer            [.] synth_1to1_MMX
     3.38%  mplayer  mplayer            [.] do_layer3.clone.3
     2.59%  mplayer  mplayer            [.] dct64_MMX_3dnowex
     0.41%  mplayer  ld-2.13.so         [.] do_lookup_x
     0.41%  mplayer  [kernel.kallsyms]  [k] __lock_acquire
     0.38%  mplayer  mplayer            [.] fast_memcpy
     0.38%  mplayer  mplayer            [.] ds_fill_buffer
     0.33%  mplayer  [kernel.kallsyms]  [k] lock_acquire
     0.33%  mplayer  mplayer            [.] dct36_3dnowex
     0.33%  mplayer  mplayer            [.] III_get_scale_factors_1
     0.32%  mplayer  mplayer            [.] demux_audio_fill_buffer



athlon# perf record -e cycles:u -e instructions:u -e l1-dcache-loads:u
-e l1-dcache-load-misses:u -e branch-misses ./mplayer -loop 10
-benchmark -ao pcm:fast:file=/dev/null -quiet -nocache stream.dump -ac
mpg123
athlon# perf report

# Events: 3K cycles
#
# Overhead  Command        Shared Object                            Symbol
# ........  .......  ...................  ................................
#
    31.04%  mplayer  libmpg123.so.0.29.6  [.] III_dequantize_sample
    15.34%  mplayer  libmpg123.so.0.29.6  [.] INT123_dct64_3dnowext
    14.67%  mplayer  libmpg123.so.0.29.6  [.] INT123_dct36_3dnowext
    11.26%  mplayer  libmpg123.so.0.29.6  [.] synth_1to1_3dnowext_asm
     7.42%  mplayer  libmpg123.so.0.29.6  [.] INT123_do_layer3
     7.15%  mplayer  libmpg123.so.0.29.6  [.] .next_loop
     3.10%  mplayer  libc-2.13.so         [.] memcpy
     1.39%  mplayer  libmpg123.so.0.29.6  [.] INT123_synth_1to1_3dnowext
     1.15%  mplayer  libmpg123.so.0.29.6  [.] III_get_scale_factors_1.clone.1
     1.04%  mplayer  ld-2.13.so           [.] do_lookup_x
     0.77%  mplayer  ld-2.13.so           [.] check_match.8296
     0.49%  mplayer  libmpg123.so.0.29.6  [.] dct12


# Events: 3K instructions
#
# Overhead  Command           Shared Object                           Symbol
# ........  .......  ......................  ...............................
#
    23.14%  mplayer  libmpg123.so.0.29.6     [.] III_dequantize_sample
    19.15%  mplayer  libmpg123.so.0.29.6     [.] INT123_dct64_3dnowext
    15.96%  mplayer  libmpg123.so.0.29.6     [.] INT123_dct36_3dnowext
    13.29%  mplayer  libmpg123.so.0.29.6     [.] synth_1to1_3dnowext_asm
    11.04%  mplayer  libmpg123.so.0.29.6     [.] .next_loop
    10.96%  mplayer  libmpg123.so.0.29.6     [.] INT123_do_layer3
     1.88%  mplayer  libmpg123.so.0.29.6     [.] INT123_synth_1to1_3dnowext
     1.11%  mplayer  libmpg123.so.0.29.6     [.] III_get_scale_factors_1.clone.1
     0.40%  mplayer  libmpg123.so.0.29.6     [.] dct12
     0.33%  mplayer  ld-2.13.so              [.] do_lookup_x
     0.26%  mplayer  libc-2.13.so            [.] _int_malloc

# Events: 3K L1-dcache-loads
#
# Overhead  Command        Shared Object                           Symbol
# ........  .......  ...................  ...............................
#
    21.63%  mplayer  libmpg123.so.0.29.6  [.] INT123_dct64_3dnowext
    15.47%  mplayer  libmpg123.so.0.29.6  [.] synth_1to1_3dnowext_asm
    15.28%  mplayer  libmpg123.so.0.29.6  [.] III_dequantize_sample
    14.05%  mplayer  libmpg123.so.0.29.6  [.] INT123_dct36_3dnowext
    11.40%  mplayer  libmpg123.so.0.29.6  [.] .next_loop
    11.07%  mplayer  libmpg123.so.0.29.6  [.] INT123_do_layer3
     4.02%  mplayer  libc-2.13.so         [.] memcpy
     2.06%  mplayer  libmpg123.so.0.29.6  [.] INT123_synth_1to1_3dnowext
     1.20%  mplayer  libmpg123.so.0.29.6  [.] III_get_scale_factors_1.clone.1
     0.37%  mplayer  libmpg123.so.0.29.6  [.] dct12
     0.31%  mplayer  ld-2.13.so           [.] do_lookup_x


# Events: 2K L1-dcache-load-misses
#
# Overhead  Command        Shared Object                           Symbol
# ........  .......  ...................  ...............................
#
    38.99%  mplayer  libc-2.13.so         [.] memcpy
    11.54%  mplayer  libmpg123.so.0.29.6  [.] III_dequantize_sample
    11.00%  mplayer  libmpg123.so.0.29.6  [.] INT123_dct36_3dnowext
     8.19%  mplayer  libmpg123.so.0.29.6  [.] INT123_do_layer3
     5.05%  mplayer  ld-2.13.so           [.] do_lookup_x
     4.48%  mplayer  libmpg123.so.0.29.6  [.] synth_1to1_3dnowext_asm
     4.12%  mplayer  libmpg123.so.0.29.6  [.] .next_loop
     2.49%  mplayer  libmpg123.so.0.29.6  [.] INT123_dct64_3dnowext
     2.32%  mplayer  ld-2.13.so           [.] check_match.8296
     1.48%  mplayer  libc-2.13.so         [.] __GI___mempcpy
     0.81%  mplayer  libc-2.13.so         [.] _int_malloc
     0.80%  mplayer  mplayer              [.] demux_audio_fill_buffer
     0.73%  mplayer  ld-2.13.so           [.] strcmp
     0.69%  mplayer  libc-2.13.so         [.] __cfree
     0.64%  mplayer  ld-2.13.so           [.] .L180
     0.46%  mplayer  libmpg123.so.0.29.6  [.] buffered_forget
     0.42%  mplayer  mplayer              [.] ds_fill_buffer


# Events: 3K branch-misses
#
# Overhead  Command        Shared Object                           Symbol
# ........  .......  ...................  ...............................
#
    80.36%  mplayer  libmpg123.so.0.29.6  [.] III_dequantize_sample
     3.09%  mplayer  libmpg123.so.0.29.6  [.] synth_1to1_3dnowext_asm
     2.24%  mplayer  libmpg123.so.0.29.6  [.] INT123_do_layer3
     1.11%  mplayer  libmpg123.so.0.29.6  [.] INT123_dct64_3dnowext
     1.00%  mplayer  libmpg123.so.0.29.6  [.] synth_stereo_wrap
     0.67%  mplayer  libmpg123.so.0.29.6  [.] .next_loop
     0.64%  mplayer  libc-2.13.so         [.] memcpy
     0.64%  mplayer  [kernel.kallsyms]    [k] __lock_acquire
     0.61%  mplayer  libmpg123.so.0.29.6  [.] INT123_dct36_3dnowext
     0.50%  mplayer  libmpg123.so.0.29.6  [.] mpg123_decode
     0.46%  mplayer  libmpg123.so.0.29.6  [.] INT123_synth_1to1_3dnowext

==========
Strange things I noticed in the above reports:

- I see ld.so "do_lookup_x" take noticeable amount of time.  To
benchmark less of startup and more the runtime execution, at first I
added "-loop 5". It however had way smaller impact than I expected,
aka something like from 1.6% to 1.2%. It's quite strange why something
that (probably) looks up symbols, is so persistent in runtime
benchmarks. Dalias didn't had explanation, other than me having
LD_PRELOAD nasty (I didn't see any).
(the function shows in both mp3lib and mpg123).

-there is a memcpy in glibc that takes most of the cache misses. I
wonder how I can make a nice percentage table of the functions that
call this memcpy().



Conclusion:
The speed slowdown may be caused by something as simple as moving
variables and buffers around.
III_dequantize_sample() seems like the function that needs most love.
And it seems to be the core function. It have a lot of cache misses
and a lot of branch misses. All things that older cpu are quite bad at
;)

Best Regards.


More information about the MPlayer-dev-eng mailing list