[FFmpeg-devel] FATE & Regressions (and PPC is broken)

Sun Mar 16 08:44:27 CET 2008

On 16 March 2008, Michael Niedermayer wrote:
> On Sun, Mar 16, 2008 at 12:26:15AM +0000, M?ns Rullg?rd wrote:
> > M?ns Rullg?rd <mans at mansr.com> writes:
> > > Diego Biurrun <diego at biurrun.de> writes:
> > >> On Sat, Mar 15, 2008 at 02:24:41AM +0100, Michael Niedermayer wrote:
> > >>> On Sat, Mar 15, 2008 at 12:36:36AM +0000, M?ns Rullg?rd wrote:
> > >>> > Mike Melanson <mike at multimedia.cx> writes:
> > >>> > > The following regression test is broken on PowerPC (gcc 4.1.2):
> > >>> > >
> > >>> > > diff -u -w
> > >>> > > "/home/melanson/ffmpeg/ffmpeg-main"/tests/ffmpeg.regression.ref
> > >>> > > tests/data/vsynth.regression
> > >>> > > --- /home/melanson/ffmpeg/ffmpeg-main/tests/ffmpeg.regression.ref
> > >>> > > 2008-03-08 19:10:07.000000000 -0800
> > >>> > > +++ tests/data/vsynth.regression        2008-03-12
> > >>> > > 14:27:53.000000000 -0700 @@ -197,7 +197,7 @@
> > >>> > >  353368 ./tests/data/a-flac.flac
> > >>> > >  c4228df189aad9567a037727d0e763e4
> > >>> > > *./tests/data/flac.vsynth.out.wav stddev: 33.31 PSNR:65.87
> > >>> > > bytes:1040384
> > >>> > > -a1e56bb034c2773438ba1c830c4cea07 *./tests/data/a-wmav1.asf
> > >>> > > +59575b4756d674cd8c8d53d86c08e8cb *./tests/data/a-wmav1.asf
> > >>> > >  106004 ./tests/data/a-wmav1.asf
> > >>> > >  stddev:12251.50 PSNR:14.56 bytes:1056768
> > >>> > >  stddev:2106.00 PSNR:29.85 bytes:1048576
> > >>> > > make: *** [codectest] Error 1
> > >>> >
> > >>> > OK, I tracked this one down.  It turns out lrint() rounds slightly
> > >>> > differently for whatever reason (bug or otherwise).  Attached patch
> > >>> > changes it to lround() (which has well-defined rounding behaviour).
> > >>> > With this change, the results are the same on x86_64 and ppc64.
> > >>> >
> > >>> > This raises the broader question of lrint() usage in FFmpeg.  The
> > >>> > documentation for the lrint() family says:
> > >>> >
> > >>> >   These functions shall round their argument to the nearest integer
> > >>> >   value, rounding according to the current rounding direction.
> > >>> >
> > >>> > Clearly, in a library we do not know the current rounding
> > >>> > direction, and it would be rude of us to change it.  The lround()
> > >>> > family, on the other hand, always rounds the same way:
> > >>> >
> > >>> >   These functions shall round their argument to the nearest integer
> > >>> >   value, rounding halfway cases away from zero, regardless of the
> > >>> >   current rounding direction.
> > >>>
> > >>> We have to use lrint* because its faster see:
> > >>
> > >> Hmm, does speed really matter that much in that part of the WMA
> > >> encoder? Or is that just a comment for the general case and Mans'
> > >> patch is OK?
> > >
> > > Yes, speed matters.  Besides, further investigation shows the
> > > inaccuracy arises elsewhere.  This change worked by accident.
> >
> > I've tracked down the real reason for the difference.  It results from
> > lost precision by use of floats in mdct.c.  Attached patch makes a few
> > temporary values double instead.  I don't know what it might do to
> > performance, though.

Hi, I'm sorry to interfere, but it would be nice to clarify a few things.

Could you also test old code on x86 with different gcc options
(check -O0, -ffloat-store, -mfpmath=sse and the others) to see 
if affects results of this regression test?

The problem may be related to the fact that x87 has 80-bit registers 
and the precision can highly depend on the number of load/store 
operations (if temporary variables in mdct are assigned to registers, 
the precision may be higher, if temporary variables are allocated on 
stack, the precision would be lost on stores clipping 80-bit registers 
to 32-bit variables on stack). As far as I know, it is x87 weird here 
and most of the other platforms do floating point math in a right way.

The change from floats to doubles can increase precision when working
with temporaries for non-x86 platforms, and could reduce the effect,
but it is only the change from (32-bit vs. 80bit) to (64-bit to 80-bit) 
and it might only hide the problem for this particular regression test.

> I dont know about performance either, i guess there are some situations
> where it would be slower 

Regarding the performance, doubles are slower than floats when performing
multiplications at least on ARM (VFP11), also extra conversion from
float to double is needed on loads which affects performance even more.

> but systems should have asm MDCTS anyway ...
> So iam ok with the change. 

It still would be nice to keep C code as a reference implementation, so 
that it could be always possible to check if an assembly optimized mdct
produces bitexact results.

Just my 2 cents...