[FFmpeg-devel] Review request - ra288.{c,h} ra144.{c,h}

Michael Niedermayer michaelni
Wed Sep 24 16:07:23 CEST 2008

On Thu, Sep 18, 2008 at 12:00:42PM +0300, Siarhei Siamashka wrote:
> On Wednesday 17 September 2008, Michael Niedermayer wrote:
> > On Wed, Sep 17, 2008 at 02:48:41AM +0300, Siarhei Siamashka wrote:
> [...]
> > > Well, the summary of my message was the following: "if you want to find
> > > an easy optimization target, grep ffmpeg sources for lrintf". One of such
> > > targets is WMA decoder, using 'float_to_int16_interleave' in it is quite
> > > trivial and provides a very noticeable performance improvement, there
> > > were even several patches floating around which can be used with minor
> > > changes.
> >
> > yes, and no ....
> > yes lrintf() is a easy optimization target, but its not
> > float_to_int16_interleave() that should be used, instead the floats
> > themselfs should be returned.
> >
> > > It would not be very nice to use lrintf in new code and dsputil functions
> > > should be preferred. For the targets (if such targets exist) where lrintf
> > > is the fastest way to convert float to integer, the dsputil function
> > > should be implemented using lrintf. Otherwise it should use SIMD
> > > instructions available on the target system, or unrolled/pipelined
> > > sequence of instructions.
> >
> > as said above, IMO the floats should be returned, but it for some reason
> > didnt work when vitor tried so i suggested lrintf() until floats can be
> > returned. dsputil would be IMHO overkill as its not supposed to stay like
> > that when float output is fixed.
> > If it was supposed tp stay then yes id agree with you that dsputil should
> > be used.
> Moving conversion outside of the decoder makes everything more modular. But
> modularity also sometimes has a cost. Let's consider the following example:
> decoder X: A -> B -> C -> D ...
> decoder Y: E -> B -> F -> G ...
> decoder Z: A -> B -> H -> I ...
> Let's suppose that B and C operations can be fused together to make them
> faster (for example merging postrotation in IMDCT and applying windowing 
> function for vorbis). In this case keeping these operations separate impacts
> performance. Introducing alternative fast path for using fused BC operation
> can be a solution. Merging some operations may be beneficial for some
> platforms but bad for the others (the number of registers in x86 is small,
> preventing some optimizations), so this selection must be somewhat
> intelligent.
> I hope that making use of C version of float_to_int16 (it is to some extent an
> example of 'fused' operation now) will not make much troubles when it is moved
> out of dsputil.
> Moreover, ARM VFP code could use one more optimization trick. VFP has 
> instruction which can convert float to int32_t with saturation 
> built-in. But there is no direct way to convert to int16_t, so the 
> current code must do saturation of values to int16_t separately. If 
> decoder output is premultiplied by 64K, it should be possible to 
> avoid this extra saturation operation and improve performance (just 
> take high 16-bits of the converted values instead of lower 16-bits).
> This also has an issue with rounding precision, but performance 
> improvement may be worth it.
> Will it be still possible to use such optimizations in a redesigned ffmpeg?

Of course, there really are many ways this can be done ...

Codecs that can scale their output efficiently could support some
AVCodeContext.output_scale (and a offset for the C convertion)
We can just introduce a float sample format that is scaled by 64k


Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Observe your enemies, for they first find out your faults. -- Antisthenes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080924/3635b9f8/attachment.pgp>

More information about the ffmpeg-devel mailing list