[Ffmpeg-devel] [PATCH] fix mpegaudiodec on ARM and benchmark

Wed Aug 23 20:32:21 CEST 2006

On Wednesday 23 August 2006 18:21, Michael Niedermayer wrote:

> On Wed, Aug 23, 2006 at 02:12:40PM +0200, Aurelien Jacobs wrote:
> > Hi,
> >
> > After the recent optimisation in mpegaudiodec, I've benchmarked mp3 on
> > ARM. But first, mpegaudiodec.c didn't compiled, so I fixed it.
> > I guess I should commit the attached patch ?
> >
> > Here is how I benchmarked:
> >  ./mplayer -quiet -ac ffmp3 -ao pcm:fast:file=/dev/null -benchmark a.mp3
> >
> > ...
> > 
> > Overall 20% speedup, which is not so bad :-)

Speedup on Nokia 770 was not so impressive, maybe because I
used '-mcpu=arm926-ej-s' option that generates the best code for 
my cpu and compiler does better job in this case. If you did not use 
any arch options, the compiler generates armv3 code by default 
and it is generally noticeably slower as only armv4 introduced 16-bit 
memory access instructions (armv3 only had 8-bit or 32-bit). So 
using at least -march=armv4 is rather important for getting good 
performance.

But anyway I'm glad that at least somebody else is interested in arm
optimizations. I checked ffmpeg commit log and did not see much arm 
related changes recently, so was not so optimistic. Well, let's make
ffmpeg faster on this platform :)

I also have a simple patch for MULS/MACS macro, it provides quite a 
noticeable performance improvement (with --disable-libavcodec_mpegaudio_hp
option at least), but requires armv5 edsp instructions support which is not
available on some machines (but intel xscale should have them). If you are
interested to try it, I can post it here, though it is quite trivial.

But some better optimizations are still required to beat libmad :)

> > Index: mpegaudiodec.c
> > ===================================================================
> > --- mpegaudiodec.c	(revision 6050)
> > +++ mpegaudiodec.c	(working copy)
> > @@ -59,13 +59,13 @@
> >  #   define MULL(a, b) \
> >          ({  int lo, hi;\
> >              asm("smull %0, %1, %2, %3     \n\t"\
> > -                "mov   %0, %0,     lsr #%4\n\t"\
> > -                "add   %1, %0, %1, lsl #%5\n\t"\
> > -            : "=r"(lo), "=r"(hi)\
> > +                "mov   %0, %0,     lsr %4\n\t"\
> > +                "add   %1, %0, %1, lsl %5\n\t"\
> > +            : "=&r"(lo), "=&r"(hi)\
> >
> >              : "r"(b), "r"(a), "i"(FRAC_BITS), "i"(32-FRAC_BITS));\
> >
> >           hi; })
> >  #   define MUL64(a,b) ((int64_t)(a) * (int64_t)(b))
> > -#   define MULH(a, b) ({ int lo, hi; asm ("smull %0, %1, %2, %3" :
> > "=r"(lo), "=r"(hi) : "r"(b),"r"(a)); hi; }) +#   define MULH(a, b) ({ int
> > lo, hi; asm ("smull %0, %1, %2, %3" : "=&r"(lo), "=&r"(hi) :
> > "r"(b),"r"(a)); hi; })
>
> i think not all 4 of the & are needed, but iam not sure ...

I think none of & are required, it would just force the compiler not to use
input registers for output arguments restricting optimization possibilities a
bit.