[FFmpeg-devel] [PATCH 2/2] mips: Optimization of AC3 FP encoder and EAC3 FP decoder

Zivkovic, Bojan (c) bojan at mips.com
Tue Oct 23 13:13:30 CEST 2012


Hello!

Nedeljko is absent from work in the next few days, so I will provide
feedback about the sent MIPS optimizations.

> > --- a/libavcodec/ac3enc.h
> > +++ b/libavcodec/ac3enc.h
> > @@ -256,6 +256,8 @@ typedef struct AC3EncodeContext {
> >       /* fixed vs. float templated function pointers */
> >       int  (*allocate_sample_buffers)(struct AC3EncodeContext *s);
> >
> > +    void (*apply_mdct)(struct AC3EncodeContext *s);
> 
> Strange. C code for apply_mdct() is basically just calling a
> DSPContext.apply_window() inside a loop. So why do you need to make an
> ASM version out of it instead of just writing a MIPS version of
> apply_window()? Is the function call is too expensive?

The reason why the optimization was moved to apply_mdct() is that the loop size is constant 
(AC3_WINDOW_SIZE = 512) so we could levarage that in our optimizaton.

But, You are right, the apply_mdct() function optimization is just optimizing apply_window(), however,
apply_window consists only of call of the vector_fmul_c function from libavutil/float_dsp.c, so we can
just optimize that function, performance will stay similar.

> > +static void ff_ac3_float_apply_channel_coupling_mips(AC3EncodeContext *s)
> > +{
> > +    LOCAL_ALIGNED_16(CoefType, cpl_coords,      [AC3_MAX_BLOCKS], [AC3_MAX_CHANNELS][16]);
> > +    LOCAL_ALIGNED_16(int32_t, fixed_cpl_coords, [AC3_MAX_BLOCKS], [AC3_MAX_CHANNELS][16]);
> > +    int blk, ch, bnd, i, j;
> > +    CoefSumType energy[AC3_MAX_BLOCKS][AC3_MAX_CHANNELS][16] = {{{0}}};
> > +    int cpl_start, num_cpl_coefs;
> > +    int32_t  *dst;
> > +    const float *src;
> > +    unsigned int len;
> > +    uint8_t *exp;
> > +    float scale = 1 << 24;
> > +    float src0, src1, src2, src3, src4, src5, src6, src7;
> > +    int temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7;
> > +    int e,v;
>
> Wow, this is a pretty complex function and optimizing it this way makes
> for a lot of code duplication. Can't you just extract the time-consuming
> parts to some DSP function?

The optimized parts of this function are calls of ac3dsp.float_to_fixed24 and ac3dsp.extract_exponents functions,
so they will be moved to these functions.

-Bojan


More information about the ffmpeg-devel mailing list