[MPlayer-dev-eng] [PATCH] Enable mp3lib's SSE routines on AMD64.

Sun May 6 00:32:51 CEST 2007

Hi,
On May 5, 2007, at 11:28 , Attila Kinali wrote:

> On Fri, 4 May 2007 17:52:38 +0200
> Guillaume Poirier <gpoirier at mplayerhq.hu> wrote:
>
>> Attached patch allows to $SUBJ
>> It does so by moving array costab_mmx from decode_MMX.c (which  
>> can't be
>> compiled on AMD64) to a separate file costab_MMX.c.
>>
>> I've played a couple of MP3s on my Hi-Fi stereo and as far as I hear,
>> there aren't any artifacts, which isn't surprising.
>>
>> I'm sure it can be improved, so I'm interested in hearing from your
>> guys.
>
> Patch works, output is correct (binary compare), but it's slightly
> slower on my machine(AMD Athlon(tm) 64 Processor 3700+):
>
> ./mplayer -ao pcm:file=/dev/null /tmp/01\ -\ Standing\ in\ the\  
> Sunset\ Glow.mp3 -benchmark -quiet
>
> w/o patch:
> BENCHMARKs: VC:   0.000s VO:   0.000s A:   5.746s Sys:   0.018s  
> =    5.764s
>
> w/ patch:
> BENCHMARKs: VC:   0.000s VO:   0.000s A:   5.807s Sys:   0.017s  
> =    5.824s
>
> This is a difference of 1%

Ouch! This wasn't intended! I haven't benchmarked it on AMD64, now  
that you bring this up, I realise I just haven't benched it at all  
and assumed that it simply could only be faster.
Here are the performance figures on the 2 other CPUs that support  
x86-64 mode:

Core2:
--- without:
BENCHMARKs: VC:   0.000s VO:   0.000s A:   1.668s Sys:   0.005s =     
1.674s
BENCHMARK%: VC:  0.0000% VO:  0.0000% A: 99.6815% Sys:  0.3185% =  
100.0000%

real    0m1.687s
user    0m1.676s
sys     0m0.012s

--- with:
BENCHMARKs: VC:   0.000s VO:   0.000s A:   1.677s Sys:   0.005s =     
1.682s
BENCHMARK%: VC:  0.0000% VO:  0.0000% A: 99.6801% Sys:  0.3199% =  
100.0000%

Exiting... (End of file)

real    0m1.696s
user    0m1.688s
sys     0m0.004s

P4:
--- without:

BENCHMARKs: VC:   0.000s VO:   0.000s A:   2.283s Sys:   0.006s =     
2.289s
BENCHMARK%: VC:  0.0000% VO:  0.0000% A: 99.7250% Sys:  0.2750% =  
100.0000%

real    0m2.326s
user    0m2.292s
sys     0m0.028s

--- with:
BENCHMARKs: VC:   0.000s VO:   0.000s A:   2.293s Sys:   0.006s =     
2.299s
BENCHMARK%: VC:  0.0000% VO:  0.0000% A: 99.7325% Sys:  0.2675% =  
100.0000%

real    0m2.336s
user    0m2.312s
sys     0m0.024s

I don't see much reason why the SSE version would be slower on all  
CPUs. I have have fumbled the patch somewhere....

I also wonder if these SSE routines have ever been faster on common  
CPUs....

Guillaume