[FFmpeg-devel] Review request - ra288.{c,h} ra144.{c,h}

Siarhei Siamashka siarhei.siamashka
Wed Sep 17 01:48:41 CEST 2008

On Wednesday 17 September 2008, Michael Niedermayer wrote:
> On Wed, Sep 17, 2008 at 12:06:31AM +0300, Siarhei Siamashka wrote:
> > On Tuesday 16 September 2008, Vitor Sessak wrote:
> > > Siarhei Siamashka wrote:
> [...]
> > > > You can try experimenting with compression of a simple signal,
> > > > something like sine and check if you can see similarities in the
> > > > input and output files. And then change this signal to something more
> > > > complicated until the differences start getting more visible.
> > >
> > > I'm not sure that it is worth the trouble just to decide on using
> > > lrintf() or not...
> >
> > I see. Anyway, I don't like lrintf, it's slow, not quite portable
> > (depends on
> Do you have some benchmark between lrintf() and a int cast that confirms
> this? (with -O3 -fno-math-errno of course)
> because last time i checked lrintf() was faster, but of course thats just
> x86

Where did I suggest to use int cast instead of lrintf? That's basically all
the answer.

> > global rounding settings) and it is only useful on very old x86 systems
> > (for which 'ff_float_to_int16_c' exists). Also as far as I know, SIMD
> > instructions at least from 3DNOW and ARM NEON only efficiently support
> > conversion with rounding to zero (please correct me if I'm wrong). And
> > conversion to int with
> which compiler generates 3dnow or NEON instructions for an int cast?
> If none then iam not sure how this could be an argument for prefering an
> int cast.
> > rounding to zero should be supported well on any hardware designed to be
> > C language friendly.
> Well it is not on pre SSE(2) x86 and on post it requires the compiler to
> generate pure SSE/SSE2 code and not utilize the x87 unit, also binaries
> compile with sse2 will not run on pre SSE2 (before P4) cpus.

Well, the summary of my message was the following: "if you want to find an
easy optimization target, grep ffmpeg sources for lrintf". One of such targets
is WMA decoder, using 'float_to_int16_interleave' in it is quite trivial and
provides a very noticeable performance improvement, there were even several
patches floating around which can be used with minor changes.

It would not be very nice to use lrintf in new code and dsputil functions
should be preferred. For the targets (if such targets exist) where lrintf is
the fastest way to convert float to integer, the dsputil function should be
implemented using lrintf. Otherwise it should use SIMD instructions available
on the target system, or unrolled/pipelined sequence of instructions.

Now regarding rounding direction. All more or less modern systems have fast
SIMD float to int conversion with rounding to zero, that's probably the
influence of C language (which was not taken into account when designing x87
for some weird reason, hence the performance troubles with a standard cast to
int on legacy x86 systems). On the other hand, 3DNOW and ARM NEON are missing
fast SIMD instructions for converting float to int using default rounding mode
(round to nearest). This requires not to use such SIMD optimized functions
with CODEC_FLAG_BITEXACT flag, which is not very convenient.

Best regards,
Siarhei Siamashka

More information about the ffmpeg-devel mailing list