[FFmpeg-devel] Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Marc Hoffman mmhoffm
Mon Jul 30 22:09:52 CEST 2007


On 7/30/07, Michael Niedermayer <michaelni at gmx.at> wrote:
>
> Hi
>
> On Mon, Jul 30, 2007 at 03:26:54PM -0400, Marc Hoffman wrote:
> > On 7/30/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> > >
> > > Hi
> > >
> > > On Mon, Jul 30, 2007 at 09:11:34AM -0400, Marc Hoffman wrote:
> > > > On 7/30/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> > > > >
> > > > > Hi
> > > > >
> > > > > On Sun, Jul 29, 2007 at 08:13:38PM -0400, Marc Hoffman wrote:
> > > > > > On 7/29/07, Diego Biurrun <diego at biurrun.de> wrote:
> > > > > > > On Sun, Jul 29, 2007 at 07:20:59PM -0400, Marc Hoffman wrote:
> > > > > > > >
> > > > > > > > sorry about the mime type gmail doesn't allow me to mark it
> as
> > > > > > > > text/x-patch.  This makes config changes.
> > > > > > >
> > > > > > > > --- configure (revision 9807)
> > > > > > > > +++ configure (working copy)
> > > > > > > > @@ -573,6 +574,7 @@
> > > > > > > >      bktr
> > > > > > > >      dc1394
> > > > > > > >      dv1394
> > > > > > > > +    fixedpoint
> > > > > > > >      ffmpeg
> > > > > > > >      ffplay
> > > > > > > >      ffserver
> > > > > > > > @@ -665,6 +667,7 @@
> > > > > > > >      fast_64bit
> > > > > > > >      fast_cmov
> > > > > > > >      fast_unaligned
> > > > > > > > +    fixedpoint
> > > > > > > >      fork
> > > > > > > >      freetype2
> > > > > > > >      GetProcessTimes
> > > > > > >
> > > > > > > Just CONFIG_LIST is enough.
> > > > > > >
> > > > > > > > --- libavcodec/Makefile       (revision 9807)
> > > > > > > > +++ libavcodec/Makefile       (working copy)
> > > > > > > > @@ -358,6 +358,10 @@
> > > > > > > >  OBJS-$(CONFIG_VP6F_DECODER)            += i386/vp3dsp_mmx.o
> > > > > i386/vp3dsp_sse2.o
> > > > > > > >  endif
> > > > > > > >
> > > > > > > > +ifeq ($(HAVE_FIXEDPOINT),yes)
> > > > > > > > +OBJS += fft_fixedpoint.o
> > > > > > > > +endif
> > > > > > >
> > > > > > > Do this in one line, like for all the other files.
> > > > > > >
> > > > > >
> > > > > > Ok guys, I removed myself from the have list...  And correct the
> > > > > > makefile like you asked before.  Much simpiler.  Again sorry
> about
> > > the
> > > > > > mime attachment....
> > > > > [...]
> > > > > > +/*
> > > > > > +  This is a fixpoint inplace 16bit FFT which accepts 3
> arguments:
> > > > > > +
> > > > > > +  @param X   - input signal in format 1.15
> > > > > > +  @param W   - phase factors in 1.15 format
> > > > > > +  @param lgN - log_2(N) where N is the size of the input data
> set.
> > > > > > +
> > > > > > +  X is the output and its adjusted format is S(1+lgN.15-lgN)
> i.e.
> > > > > > +    if we are talking about a 256 point fft then the output
> format
> > > is
> > > > > 9.6.
> > > > > > +*/
> > > > >
> > > > > not doxygen compatible
> > > > >
> > > > >
> > > > > [...]
> > > > > > +                tr        = (X[k2].re*wwr + 0x4000)>>15;
> > > > > > +                tr       -= (X[k2].im*wwi + 0x4000)>>15;
> > > > > > +                ti        = (X[k2].re*wwi + 0x4000)>>15;
> > > > > > +                ti       += (X[k2].im*wwr + 0x4000)>>15;
> > > > > > +
> > > > > > +                X[k2].re  = (X[k].re - tr)>>1;
> > > > > > +                X[k2].im  = (X[k].im - ti)>>1;
> > > > > > +
> > > > > > +                X[k].re   = (X[k].re + tr)>>1;
> > > > > > +                X[k].im   = (X[k].im + ti)>>1;
> > > > >
> > > > > why not >>16 ? that way you would have 4 shifts less
> > > >
> > > >
> > > > At a greater loss of precision we could do that.  You realize those
> > > shifts
> > > > are guards against overflow, and they cause precision to be dynamic
> > > based on
> > > > the number of stages.
> > >
> > > if you have a signal of constant -32768 then the constant term of a
> > > 512 point FFT will be -512*32768/sqrt(512) this is 741455 which doesnt
> > > fit in 16bit
> > >
> > > if we now assume that you rescale this so it does not overflow, that
> is
> > > the
> > > largest term becomes -32768 and you do a IFFT with a /2 downscale in
> every
> > > step then you get a constant signal of 64
> > > thats a huge 7bits of precission left per sample (if we assume that
> the
> > > intermediate
> > > calculations dont ntroduce any inaccuracies), a 8bit ISA soundcard
> from
> > > 15 years ago should sound better ...
> > >
> > > if you want to optimize the code please first do highlevel
> optimizations
> > > that is change the code to a split radix fft
> > > then _while_ very carefully testing the accuracy on _real_ sound
> perform
> > > low level optimizations
> >
> >
> > I guess I'm missing something here.  Why are you so bent on this split
> radix
> > thing?
>
> common sense?
> first you choose the most efficient algorithm then AFTER THAT you go and
> optimize it, what you do is like optimizing a bubble sort while refusing
> to
> use a quicksort ...
>
> i agreed to your original proposal of doing optimizations later i do not
> agree to low level optimizations of an inefficient FFT


And which algorithm is the most efficient?




More information about the ffmpeg-devel mailing list