[FFmpeg-devel] [RFC/PATCH] More flexible variafloat_to_int16 , WMA optimization, Vorbis

Fri Jul 18 08:17:05 CEST 2008

On Thursday 17 July 2008, Loren Merritt wrote:
> On Wed, 16 Jul 2008, Siarhei Siamashka wrote:
> > Well, merging the loops that are run after iFFT and combining them with
> > windowing code can probably provide interesting results. At least it
> > should eliminate a lot of intermediate load and store operations. Maybe
> > having iFFT output processed in a single loop could allow reading old
> > saved data and also replace it with new saved data at the same time? At
> > least in some simple cases when previous and current blocks have the same
> > size.
>
> Kinda ugly, having to merge those functions instead of composing them out
> of small readable dsps. But it works.
> total vorbis speedup
> k8: 1.3%
> conroe: 5%
> penryn: 8%
> prescott: no change

Thanks, performance improvement is something like 2% on Pentium-M after
applying this patch.

Also could you consider using a variation of CMUL with negated .re
value at the post rotation step in 'ff_imdct_half'? Something like:

#define CMUL_NEG(pre, pim, are, aim, bre, bim) \
{\
    double _are = (are);\
    double _aim = (aim);\
    double _bre = (bre);\
    double _bim = (bim);\
    (pre) = _aim * _bim - _are * _bre;\
    (pim) = _are * _bim + _aim * _bre;\
}

...
/* post rotation */
for(k = 0; k < n4; k++) {
    CMUL_NEG(z[k].re, z[k].im, z[k].re, z[k].im, tcos[k], tsin[k]);
}
...

In this case you don't have to change sign on loading data from 'fft' buffer
in 'ff_vector_fmul_window' which should make code faster.

Or alternatively also merge post rotation into 'ff_vector_fmul_window' (so the
values don't need to land to memory only to be read shortly after that).

-- 
Best regards,
Siarhei Siamashka