[Ffmpeg-devel] [PATCH] fix mpegaudiodec on ARM and benchmark
Aurelien Jacobs
aurel
Thu Aug 24 00:08:22 CEST 2006
On Wed, 23 Aug 2006 17:21:29 +0200
Michael Niedermayer <michaelni at gmx.at> wrote:
> Hi
>
> On Wed, Aug 23, 2006 at 02:12:40PM +0200, Aurelien Jacobs wrote:
> > Hi,
> >
> > After the recent optimisation in mpegaudiodec, I've benchmarked mp3 on ARM.
> > But first, mpegaudiodec.c didn't compiled, so I fixed it.
> > I guess I should commit the attached patch ?
> >
> > Here is how I benchmarked:
> > ./mplayer -quiet -ac ffmp3 -ao pcm:fast:file=/dev/null -benchmark a.mp3
> >
> > And here are the results with various lavc revisions (Xscale IXP420).
> >
> > r6036
> > BENCHMARKs: VC: 0.000s VO: 0.000s A: 216.931s Sys: 0.414s = 217.346s
> > BENCHMARK%: VC: 0.0000% VO: 0.0000% A: 99.8093% Sys: 0.1907% = 100.0000%
> >
> > r6037
> > BENCHMARKs: VC: 0.000s VO: 0.000s A: 212.347s Sys: 0.412s = 212.759s
> > BENCHMARK%: VC: 0.0000% VO: 0.0000% A: 99.8062% Sys: 0.1938% = 100.0000%
> >
> > r6039
> > BENCHMARKs: VC: 0.000s VO: 0.000s A: 212.703s Sys: 0.411s = 213.114s
> > BENCHMARK%: VC: 0.0000% VO: 0.0000% A: 99.8070% Sys: 0.1930% = 100.0000%
> >
> > r6050 (patched)
> > BENCHMARKs: VC: 0.000s VO: 0.000s A: 170.642s Sys: 0.411s = 171.053s
> > BENCHMARK%: VC: 0.0000% VO: 0.0000% A: 99.7597% Sys: 0.2403% = 100.0000%
> >
> > Overall 20% speedup, which is not so bad :-)
>
> was this with or without --disable-libavcodec_mpegaudio_hp ?
All my tests are done *not* using --disable-libavcodec_mpegaudio_hp.
> > Index: mpegaudiodec.c
> > ===================================================================
> > --- mpegaudiodec.c (revision 6050)
> > +++ mpegaudiodec.c (working copy)
> > @@ -59,13 +59,13 @@
> > # define MULL(a, b) \
> > ({ int lo, hi;\
> > asm("smull %0, %1, %2, %3 \n\t"\
> > - "mov %0, %0, lsr #%4\n\t"\
> > - "add %1, %0, %1, lsl #%5\n\t"\
> > - : "=r"(lo), "=r"(hi)\
> > + "mov %0, %0, lsr %4\n\t"\
> > + "add %1, %0, %1, lsl %5\n\t"\
> > + : "=&r"(lo), "=&r"(hi)\
> > : "r"(b), "r"(a), "i"(FRAC_BITS), "i"(32-FRAC_BITS));\
> > hi; })
> > # define MUL64(a,b) ((int64_t)(a) * (int64_t)(b))
> > -# define MULH(a, b) ({ int lo, hi; asm ("smull %0, %1, %2, %3" : "=r"(lo), "=r"(hi) : "r"(b),"r"(a)); hi; })
> > +# define MULH(a, b) ({ int lo, hi; asm ("smull %0, %1, %2, %3" : "=&r"(lo), "=&r"(hi) : "r"(b),"r"(a)); hi; })
>
> i think not all 4 of the & are needed, but iam not sure ...
If I remove any one of them, I get a load of messages like this one:
{standard input}: Assembler messages:
{standard input}:630: rdhi, rdlo and rm must all be different
Note that I'm cross-compiling with gcc-4.1 if that's relevant.
> also please try to xchange a and b, some ARM cpus need less time to do
> multiplications if the right one of these is small but i dunno which one it
> was ...
The current order seems to be the fastest, but the difference is very slight.
> and another idea, try to set -mcpu -march -mtune correctly for the cpu
When setting -march=armv4 or armv4t or armv5 or armv5t it don't even compile:
arm-linux-gnu-gcc -DHAVE_AV_CONFIG_H -I.. -I../libavutil -Wdeclaration-after-statement -march=armv5t -D_REENTRANT -I/usr/include -I/usr/src/DVB/ost/include -I/usr/include/dxr2 -I/usr/local/include/cdda -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_ISOC9X_SOURCE -c -o armv4l/dsputil_arm_s.o armv4l/dsputil_arm_s.S
armv4l/dsputil_arm_s.S: Assembler messages:
armv4l/dsputil_arm_s.S:77: Error: selected processor does not support `pld [r1]'
armv4l/dsputil_arm_s.S:88: Error: selected processor does not support `pld [r1]'
[...]
Setting -march=armv5te (which is exactly what my Xscale is) is quite
slower, I don't understand why:
BENCHMARKs: VC: 0.000s VO: 0.000s A: 206.553s Sys: 0.438s = 206.991s
BENCHMARK%: VC: 0.0000% VO: 0.0000% A: 99.7882% Sys: 0.2118% = 100.0000%
Now I also benchmarked libmad. It's still "slightly" faster !
BENCHMARKs: VC: 0.000s VO: 0.000s A: 54.212s Sys: 0.407s = 54.618s
BENCHMARK%: VC: 0.0000% VO: 0.0000% A: 99.2554% Sys: 0.7446% = 100.0000%
Then I benchmarked ffmp3 r6050 (patched) with --disable-libavcodec_mpegaudio_hp
BENCHMARKs: VC: 0.000s VO: 0.000s A: 78.751s Sys: 0.419s = 79.171s
BENCHMARK%: VC: 0.0000% VO: 0.0000% A: 99.4702% Sys: 0.5298% = 100.0000%
Pretty impressive ! Not so far from libmad !
Aurel
More information about the ffmpeg-devel
mailing list