[FFmpeg-devel] Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Mon Jul 30 13:29:13 CEST 2007

Hi

On Sun, Jul 29, 2007 at 08:13:38PM -0400, Marc Hoffman wrote:
> On 7/29/07, Diego Biurrun <diego at biurrun.de> wrote:
> > On Sun, Jul 29, 2007 at 07:20:59PM -0400, Marc Hoffman wrote:
> > >
> > > sorry about the mime type gmail doesn't allow me to mark it as
> > > text/x-patch.  This makes config changes.
> >
> > > --- configure (revision 9807)
> > > +++ configure (working copy)
> > > @@ -573,6 +574,7 @@
> > >      bktr
> > >      dc1394
> > >      dv1394
> > > +    fixedpoint
> > >      ffmpeg
> > >      ffplay
> > >      ffserver
> > > @@ -665,6 +667,7 @@
> > >      fast_64bit
> > >      fast_cmov
> > >      fast_unaligned
> > > +    fixedpoint
> > >      fork
> > >      freetype2
> > >      GetProcessTimes
> >
> > Just CONFIG_LIST is enough.
> >
> > > --- libavcodec/Makefile       (revision 9807)
> > > +++ libavcodec/Makefile       (working copy)
> > > @@ -358,6 +358,10 @@
> > >  OBJS-$(CONFIG_VP6F_DECODER)            += i386/vp3dsp_mmx.o i386/vp3dsp_sse2.o
> > >  endif
> > >
> > > +ifeq ($(HAVE_FIXEDPOINT),yes)
> > > +OBJS += fft_fixedpoint.o
> > > +endif
> >
> > Do this in one line, like for all the other files.
> >
> 
> Ok guys, I removed myself from the have list...  And correct the
> makefile like you asked before.  Much simpiler.  Again sorry about the
> mime attachment....
[...]
> +/*
> +  This is a fixpoint inplace 16bit FFT which accepts 3 arguments:
> +
> +  @param X   - input signal in format 1.15
> +  @param W   - phase factors in 1.15 format
> +  @param lgN - log_2(N) where N is the size of the input data set.
> +
> +  X is the output and its adjusted format is S(1+lgN.15-lgN) i.e.
> +    if we are talking about a 256 point fft then the output format is 9.6.
> +*/

not doxygen compatible

[...]
> +                tr        = (X[k2].re*wwr + 0x4000)>>15;
> +                tr       -= (X[k2].im*wwi + 0x4000)>>15;
> +                ti        = (X[k2].re*wwi + 0x4000)>>15;
> +                ti       += (X[k2].im*wwr + 0x4000)>>15;
> +
> +                X[k2].re  = (X[k].re - tr)>>1;
> +                X[k2].im  = (X[k].im - ti)>>1;
> +
> +                X[k].re   = (X[k].re + tr)>>1;
> +                X[k].im   = (X[k].im + ti)>>1;

why not >>16 ? that way you would have 4 shifts less

[...]
> +        w=0;
> +	hm = m>>1;
> +        for (j=0; j<hm; j++) {

tabs

[...]
> +
> +                X[k2].re  = (X[k].re - tr);
> +                X[k2].im  = (X[k].im - ti);
> +
> +                X[k].re   = (X[k].re + tr);
> +                X[k].im   = (X[k].im + ti);

superflous ()

[...]
> +                tr        = (X[k2].re*wwr + 0x40000000)>>31;
> +                tr       -= (X[k2].im*wwi + 0x40000000)>>31;
> +                ti        = (X[k2].re*wwi + 0x40000000)>>31;
> +                ti       += (X[k2].im*wwr + 0x40000000)>>31;

>>32 !!
with >>31 this code just is not usefull, cpus tend to have 32bit registers
not 31bit so this is just the same as the other code
with >>32 AND without the + 0x... several operations could be avoided
thus making this as fast as the 16bit code on a reasonable cpu

[...]
> +    FFTComplex16 *v = av_malloc (sizeof (short)*n);

types missmatch

[...]
> +/* complex multiplication: p = a * b */
> +#define CMUL(pre, pim, are, aim, bre, bim) \

not doxygen compatible

[...]
> Index: libavcodec/fft-test.c
> ===================================================================
> --- libavcodec/fft-test.c	(revision 9807)
> +++ libavcodec/fft-test.c	(working copy)

this is a mess, see dct-test.c for how to test several implementations of
some code

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The educated differ from the uneducated as much as the living from the
dead. -- Aristotle 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070730/d36d3d28/attachment.pgp>