[FFmpeg-devel] [PATCH] avfilter/dctdnoiz: rewrite [f/i]dct
Clément Bœsch
u at pkh.me
Thu Aug 7 20:06:41 CEST 2014
On Mon, Aug 04, 2014 at 05:08:21PM +0200, Michael Niedermayer wrote:
[...]
> > > > + const float xd_2 = 1.306562964876380*xc_2 + 0.541196100146197*xc_3;
> > > > + const float xd_3 = 0.541196100146197*xc_2 - 1.306562964876380*xc_3;
> > > > + const float x1_9 = 0.707106781186547*xb_2 - 0.707106781186547*xd_3;
> > > > + const float x1_a = 0.707106781186547*xb_2 + 0.707106781186547*xd_3;
> > > > + const float x1_b = 0.707106781186547*xb_1 + 0.707106781186547*xd_1;
> > > > + const float x1_c = 0.707106781186547*xb_1 - 0.707106781186547*xd_1;
> > > > + const float x1_d = 0.707106781186547*xb_3 - 0.707106781186547*xd_2;
> > > > + const float x1_e = 0.707106781186547*xb_3 + 0.707106781186547*xd_2;
> > > > + dst[ 0*dst_stridea] = 0.25*x5_0;
> > > > + dst[ 1*dst_stridea] = 0.25*xb_0;
> > > > + dst[ 2*dst_stridea] = 0.25*x7_0;
> > > > + dst[ 3*dst_stridea] = 0.25*x1_9;
> > > > + dst[ 4*dst_stridea] = 0.25*x5_2;
> > > > + dst[ 5*dst_stridea] = 0.25*x1_a;
> > > > + dst[ 6*dst_stridea] = 0.25*x3_5;
> > > > + dst[ 7*dst_stridea] = 0.25*x1_b;
> > > > + dst[ 8*dst_stridea] = 0.25*x5_1;
> > > > + dst[ 9*dst_stridea] = 0.25*x1_c;
> > > > + dst[10*dst_stridea] = 0.25*x3_6;
> > > > + dst[11*dst_stridea] = 0.25*x1_d;
> > > > + dst[12*dst_stridea] = 0.25*x5_3;
> > > > + dst[13*dst_stridea] = 0.25*x1_e;
> > > > + dst[14*dst_stridea] = 0.25*x7_2;
> > > > + dst[15*dst_stridea] = 0.25*xd_0;
> > >
> > > many of these multiplies look like they can be merged into other
> > > multiplies
> > >
> > > for example see:
> > >
> > >
> > > const float xd_2 = 1.306562964876380*xc_2 + 0.541196100146197*xc_3;
> > > const float xb_3 = 0.541196100146197*xa_2 - 1.306562964876380*xa_3;
> > > const float x1_d = 0.707106781186547*xb_3 - 0.707106781186547*xd_2;
> > > const float x1_e = 0.707106781186547*xb_3 + 0.707106781186547*xd_2;
> > > dst[11*dst_stridea] = 0.25*x1_d;
> > > dst[13*dst_stridea] = 0.25*x1_e;
> > >
> > > vs.
> > >
> > > const float xd_2 = (0.25*0.707106781186547*1.306562964876380)*xc_2 + (0.25*0.707106781186547*0.541196100146197)*xc_3;
> > > const float xb_3 = (0.25*0.707106781186547*0.541196100146197)*xa_2 - (0.25*0.707106781186547*1.306562964876380)*xa_3;
> > > dst[11*dst_stridea] = xb_3 - xd_2;
> > > dst[13*dst_stridea] = xb_3 + xd_2;
> >
> > also more generally
> > if you have 2 stages of butterflies each with 4 multiplies and 2 adds
> > in each butterfly
> >
> > a----\-/--\---/----------a'
> > X \ /
> > b----/-\----------\---/--b'
> > / \ \ /
> > c----\-/--/---\----------c'
> > X / \
> > d----/-\----------/---\--d'
> >
> >
> > of additions
> > the first stage can scale their output arbitrarily for free by
> > changing the respective coefficients
> > the second stage can use any scaled input for free by adjusting their
> > coefficients similarly, this gives you 4 free parameters in the
> > example above which can be
> > choosen so as to make some coefficients trivial like +-1.0
> > this also works accorss 2D (I)DCTs or with other things before or
> > after the (i)dct which can absorb such rescaling
>
> also i suggest that the patch is applied before time is
> spend optimizing it further,
> also the moving around of multiplies can probably affect numerical
> stability if overdone
I factorized obvious cases as you suggested. I also made the generated
code more readable. Further optimizations are postponed.
Applied, thanks!
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140807/72a83154/attachment.asc>
More information about the ffmpeg-devel
mailing list