[FFmpeg-devel] [PATCH v3] SH4 mpegaudio decoder optimizations
Guennadi Liakhovetski
g.liakhovetski
Sun Dec 26 19:49:29 CET 2010
On Sun, 26 Dec 2010, Michael Niedermayer wrote:
> On Fri, Dec 17, 2010 at 05:54:56PM +0100, Guennadi Liakhovetski wrote:
> > Hi all
> >
> > This is version 3 of the following patch:
> >
> > This patch implements several mpegaudio optimizations for SH4 FPU-enabled
> > SoCs. Verified to provide more than 40% acceleration, when decoding a
> > 128kbps stereo mp3 audio file to /dev/null.
> >
> > Changes listed in the patch header, thanks to all who commented!
> >
> > Thanks
> > Guennadi
> > ---
> > Guennadi Liakhovetski, Ph.D.
> > Freelance Open-Source Software Developer
> > http://www.open-technology.de/
> > mathops.h | 2 +
> > mpegaudiodec.c | 26 ++++++++++++---
> > sh4/mathops.h | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 120 insertions(+), 5 deletions(-)
> > ad802c254a9f25cf487eba7d3cb584112c6c59b9 ff_mpegaudio_sh4-v3.patch
> > From: Guennadi Liakhovetski <g.liakhovetski at gmx.de>
> > Subject: [PATCH v3] SH4 mpegaudio decoder optimizations
> >
> > This patch implements several mpegaudio optimizations for SH4 FPU-enabled SoCs.
> > Verified to provide more than 40% acceleration, when decoding a 128kbps stereo
> > mp3 audio file to /dev/null.
> > ---
> >
> > v3:
> > 1. added a "&" constraint to input-output assembly parameters
> >
> > v2:
> > 1. fixed copyright year to 2010
> > 2. consistent lower-case register names
> > 3. joined inline assembly macros into one
> >
> > diff --git a/libavcodec/mathops.h b/libavcodec/mathops.h
> > index 4d88ed1..b234ae4 100644
> > --- a/libavcodec/mathops.h
> > +++ b/libavcodec/mathops.h
> > @@ -34,6 +34,8 @@
> > # include "mips/mathops.h"
> > #elif ARCH_PPC
> > # include "ppc/mathops.h"
> > +#elif ARCH_SH4
> > +# include "sh4/mathops.h"
> > #elif ARCH_X86
> > # include "x86/mathops.h"
> > #endif
> > diff --git a/libavcodec/mpegaudiodec.c b/libavcodec/mpegaudiodec.c
> > index 769be89..513fa1b 100644
> > --- a/libavcodec/mpegaudiodec.c
> > +++ b/libavcodec/mpegaudiodec.c
> > @@ -584,6 +584,22 @@ static inline int round_sample(int64_t *sum)
> > op(sum, (w)[7 * 64], (p)[7 * 64]); \
> > }
> >
> > +#ifndef SUM8_MACS
> > +#define SUM8_MACS(sum, w, p) SUM8(MACS, sum, w, p)
> > +#endif
> > +
> > +#ifndef SUM8_MLSS
> > +#define SUM8_MLSS(sum, w, p) SUM8(MLSS, sum, w, p)
> > +#endif
> > +
> > +#ifndef SUM8P2_MACS_MLSS
> > +#define SUM8P2_MACS_MLSS(sum, sum2, w, w2, p) SUM8P2(sum, MACS, sum2, MLSS, w, w2, p)
> > +#endif
> > +
> > +#ifndef SUM8P2_MLSS_MLSS
> > +#define SUM8P2_MLSS_MLSS(sum, sum2, w, w2, p) SUM8P2(sum, MLSS, sum2, MLSS, w, w2, p)
> > +#endif
> > +
> > #define SUM8P2(sum1, op1, sum2, op2, w1, w2, p) \
> > { \
> > INTFLOAT tmp;\
> > @@ -666,9 +682,9 @@ static void apply_window_mp3_c(MPA_INT *synth_buf, MPA_INT *window,
> >
> > sum = *dither_state;
> > p = synth_buf + 16;
> > - SUM8(MACS, sum, w, p);
> > + SUM8_MACS(sum, w, p);
> > p = synth_buf + 48;
> > - SUM8(MLSS, sum, w + 32, p);
> > + SUM8_MLSS(sum, w + 32, p);
> > *samples = round_sample(&sum);
> > samples += incr;
> > w++;
>
> This is ugly
> You should implement the API not change it in one file and implement the
> modified API
This doesn't change any API. This just adds a couple of macros completely
internal to this file, no existing users are affected.
> [...]
> > +
> > +#define SH_SUM8_MACS(sum, w, p) \
> > +do { \
> > + register const int32_t *wp = (w), *pp = (p); \
> > + register int32_t tmp = 63 * 4; \
> > + union {int64_t x; int32_t u32[2];} u = \
> > + {.x = (sum),}; \
> > + __asm__ volatile( \
> > + " lds %[hi], mach\n" \
> > + " lds %[lo], macl\n" \
> > + " mac.l @%[wp]+, @%[pp]+\n" /* 0 */ \
> > + " add %[tmp], %[wp]\n" \
> > + " add %[tmp], %[pp]\n" \
> > + " mac.l @%[wp]+, @%[pp]+\n" /* 1 */ \
> > + " add %[tmp], %[wp]\n" \
> > + " add %[tmp], %[pp]\n" \
> > + " mac.l @%[wp]+, @%[pp]+\n" /* 2 */ \
> > + " add %[tmp], %[wp]\n" \
> > + " add %[tmp], %[pp]\n" \
> > + " mac.l @%[wp]+, @%[pp]+\n" /* 3 */ \
> > + " add %[tmp], %[wp]\n" \
> > + " add %[tmp], %[pp]\n" \
> > + " mac.l @%[wp]+, @%[pp]+\n" /* 4 */ \
> > + " add %[tmp], %[wp]\n" \
> > + " add %[tmp], %[pp]\n" \
> > + " mac.l @%[wp]+, @%[pp]+\n" /* 5 */ \
> > + " add %[tmp], %[wp]\n" \
> > + " add %[tmp], %[pp]\n" \
> > + " mac.l @%[wp]+, @%[pp]+\n" /* 6 */ \
> > + " add %[tmp], %[wp]\n" \
> > + " add %[tmp], %[pp]\n" \
> > + " mac.l @%[wp]+, @%[pp]+\n" /* 7 */ \
> > + " sts mach, %[hi]\n" \
> > + " sts macl, %[lo]" \
> > + : [hi] "+&r" (u.u32[1]), \
> > + [lo] "+&r" (u.u32[0]), \
> > + [wp] "+&r" (wp), \
> > + [pp] "+&r" (pp) \
> > + : [tmp] "r" (tmp) \
> > + : "mach", "macl"); \
> > + sum = u.x; \
> > +} while (0)
>
> Write the whole loop in asm, leaving it to gcc is not a good idea it can just
> mess up.
Sorry, what do you mean? how can it "just mess up?" Actually, what loop do
you mean? You don't mean the "do {...} while (0)" construct, do you?
> also i assume there are no multiply and subtract instructions in SH4 ?
no.
> btw do you have a instruction set reference you can share or a url to one ?
Thanks to Bobby Bingham for sending this to you.
Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
More information about the ffmpeg-devel
mailing list