[MPlayer-dev-eng] possible bugs in vf_decimate filter

Tue Oct 17 03:03:39 CEST 2006

Hi

On Mon, Oct 16, 2006 at 04:54:59PM -0700, Trent Piepho wrote:
> On Mon, 16 Oct 2006, Michael Niedermayer wrote:
> >
> >
> >                 "decl %%ecx \n\t"
> >
> > id use a "cmp %%"REG_S", ...  here, some cpus have a dissike for inc/dec as
> > inc/dec just change part of the flags which creates a dependancy to the
> > previous flag value
> 
> The code is also using ecx without listing it as a clobbered register.  It

:/

[...]

> 
> >                 "jnz 1b \n\t"
> >                 "movq %%mm4, (%%"REG_d") \n\t"
> >                 "emms \n\t"
> >
> > emms should be farther outside
> 
> That's what I thought too.  But when I did it, it became _much_ slower!  Take
> a look at this code and see you notice what's wrong:
> 
> static int diff_to_drop_plane(int hi, int lo, float frac, unsigned char *old, unsigned char *new,
> 	int w, int h, int os, int ns)
> {
>         int x, y;
>         int d, c=0, r=0;
>         int t = (w/16)*(h/16)*frac;
>         for (y = 0; y < h-7; y += 4) {
>                 for (x = 0; x < w-7; x += 4) {
>                         d = diff(old+x+y*os, new+x+y*ns, os, ns);  /* 1 */
>                         if (d > hi) goto done;
>                         if (d > lo) {  c++; if (c > t) goto done; }
>                 }
>         }
>         r = 1;
> done:
> 	asm volatile("emms");   /* 2 */
>         return r;
> }
> 
> I removed the "emms" from the asm block in the diff_MMX() function.  This code
> ended up slowing the filter down by around 20%!  It has nothing do with the
> extra variable 'r' or the goto.  If I add an extra emms right after the call to
> diff at (1), it speeds up to around the original speed (even with the extra emms
> as (2), the goto, and the variable 'r').
> 
> You can clearly see that there is no floating point between (1), where the
> mmx diff function is called, and (2), where emms is run.  The variable 't'
> is used, but it is an int, which was calculated before the loop using a
> float, 'frac'.
> 
> There is no floating point in the C code, but look at the generated assembly!
> 
>         call    *diff
>         cmpl    %eax, 212(%esp)
>         flds    48(%esp)
>         jge     .L112
>         ffreep  %st(0)
> 
> You see the call to the diff_MMX function via the diff function pointer,
> which is only used in that one place.  Then there is some floating point,
> flds and ffreep, with NO emms!  The emms is in the asm code elsewhere, gcc
> has just moved the floating point around so it's not in the right spot
> anymore!
> 
> Using mmx in asm statements anywhere near floating point is very dangerous.
> For example, this code will miscompile and print "nan" if you use -O3 (so
> that mmxfunc() is inlined).  There is clearly not a missing emms here.  The
> problem is that gcc leaves 'f' in a floating pointer register across the
> inlined call to mmxfunc(), and the mmx code clears the fp regiters.

hmm what about adding the float registers to the clobber list? if that 
fails then the simplest solution for vf_decimate.c seems to me to replace 
the float by an int (fixedpoint, dunno what its range needs to be ...)
the alternatives are to put the mmx registers on the clobber list (breaks gcc
2.95 IIRC) or maybe some hack with volatile float  ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is