[FFmpeg-devel] Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization
Michael Niedermayer
michaelni
Mon Jul 30 13:29:13 CEST 2007
Hi
On Sun, Jul 29, 2007 at 08:13:38PM -0400, Marc Hoffman wrote:
> On 7/29/07, Diego Biurrun <diego at biurrun.de> wrote:
> > On Sun, Jul 29, 2007 at 07:20:59PM -0400, Marc Hoffman wrote:
> > >
> > > sorry about the mime type gmail doesn't allow me to mark it as
> > > text/x-patch. This makes config changes.
> >
> > > --- configure (revision 9807)
> > > +++ configure (working copy)
> > > @@ -573,6 +574,7 @@
> > > bktr
> > > dc1394
> > > dv1394
> > > + fixedpoint
> > > ffmpeg
> > > ffplay
> > > ffserver
> > > @@ -665,6 +667,7 @@
> > > fast_64bit
> > > fast_cmov
> > > fast_unaligned
> > > + fixedpoint
> > > fork
> > > freetype2
> > > GetProcessTimes
> >
> > Just CONFIG_LIST is enough.
> >
> > > --- libavcodec/Makefile (revision 9807)
> > > +++ libavcodec/Makefile (working copy)
> > > @@ -358,6 +358,10 @@
> > > OBJS-$(CONFIG_VP6F_DECODER) += i386/vp3dsp_mmx.o i386/vp3dsp_sse2.o
> > > endif
> > >
> > > +ifeq ($(HAVE_FIXEDPOINT),yes)
> > > +OBJS += fft_fixedpoint.o
> > > +endif
> >
> > Do this in one line, like for all the other files.
> >
>
> Ok guys, I removed myself from the have list... And correct the
> makefile like you asked before. Much simpiler. Again sorry about the
> mime attachment....
[...]
> +/*
> + This is a fixpoint inplace 16bit FFT which accepts 3 arguments:
> +
> + @param X - input signal in format 1.15
> + @param W - phase factors in 1.15 format
> + @param lgN - log_2(N) where N is the size of the input data set.
> +
> + X is the output and its adjusted format is S(1+lgN.15-lgN) i.e.
> + if we are talking about a 256 point fft then the output format is 9.6.
> +*/
not doxygen compatible
[...]
> + tr = (X[k2].re*wwr + 0x4000)>>15;
> + tr -= (X[k2].im*wwi + 0x4000)>>15;
> + ti = (X[k2].re*wwi + 0x4000)>>15;
> + ti += (X[k2].im*wwr + 0x4000)>>15;
> +
> + X[k2].re = (X[k].re - tr)>>1;
> + X[k2].im = (X[k].im - ti)>>1;
> +
> + X[k].re = (X[k].re + tr)>>1;
> + X[k].im = (X[k].im + ti)>>1;
why not >>16 ? that way you would have 4 shifts less
[...]
> + w=0;
> + hm = m>>1;
> + for (j=0; j<hm; j++) {
tabs
[...]
> +
> + X[k2].re = (X[k].re - tr);
> + X[k2].im = (X[k].im - ti);
> +
> + X[k].re = (X[k].re + tr);
> + X[k].im = (X[k].im + ti);
superflous ()
[...]
> + tr = (X[k2].re*wwr + 0x40000000)>>31;
> + tr -= (X[k2].im*wwi + 0x40000000)>>31;
> + ti = (X[k2].re*wwi + 0x40000000)>>31;
> + ti += (X[k2].im*wwr + 0x40000000)>>31;
>>32 !!
with >>31 this code just is not usefull, cpus tend to have 32bit registers
not 31bit so this is just the same as the other code
with >>32 AND without the + 0x... several operations could be avoided
thus making this as fast as the 16bit code on a reasonable cpu
[...]
> + FFTComplex16 *v = av_malloc (sizeof (short)*n);
types missmatch
[...]
> +/* complex multiplication: p = a * b */
> +#define CMUL(pre, pim, are, aim, bre, bim) \
not doxygen compatible
[...]
> Index: libavcodec/fft-test.c
> ===================================================================
> --- libavcodec/fft-test.c (revision 9807)
> +++ libavcodec/fft-test.c (working copy)
this is a mess, see dct-test.c for how to test several implementations of
some code
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The educated differ from the uneducated as much as the living from the
dead. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070730/d36d3d28/attachment.pgp>
More information about the ffmpeg-devel
mailing list