[MPlayer-dev-eng] Improved remove-logo filter

Guillaume POIRIER poirierg at gmail.com
Tue Oct 31 10:31:55 CET 2006


Hi Trent,

Were you able to work on improving your patch?

On 9/15/06, Trent Piepho <xyzzy at speakeasy.org> wrote:
> On Thu, 14 Sep 2006, Loren Merritt wrote:
> > On Thu, 14 Sep 2006, Trent Piepho wrote:
> > > So I tried writing the inner loop (over one line) in asm so accumulator
> > > would be kept in mm1 for the whole loop.  Gcc still spills and loads
> > > accumulator for no reason on each outer loop (for each line).  This ended
> > > up being about the same speed.
> >
> > Not that there's anything wrong with writing the loops in asm, but you
> > don't have to do that just to keep the accumulator in an mmreg. "y"
> > constraints are not needed, unless you _want_ gcc to load/spill values.
>
> Good idea, I hadn't thought of trying that.  It only works as long as gcc
> doesn't touch the mmx register.  Which is true I think, even if you enable
> -fmmx gcc won't generate any code that uses mmx registers unless you
> explictly write some (with asm, builtins or vector types).  If it was
> another general purpose register, that wouldn't work.
>
> > Or with both loops as one asm block, you can bring back "+y"(accumulator)
> > instead of the explicit movd.
>
> I had that originally, since I wrote gcc 3.1+ code with symbolic register
> names.  I translated it back to gcc 2.95 asm, and the gcc 2.95 code
> benchmarked at the same speed.  So I figured there was no point in having
> two versions.
>
> > > : "=m" (accumulator), "=r" (i), "=g" (j), "=r" (mask), "=r" (image)
> > > : "m" (accumulator), "1" (i), "2" (j), "3" (mask), "4" (image),
> > >   "g" (logo_mask->width), "g" (stride)
> >
> >    : "+m" (accumulator), "+r" (i), "+g" (j), "+r" (mask), "+r" (image)
> >    : "g" (logo_mask->width), "g" (stride)
>
> I've read several places that you can't use "+" to indicate an input/output
> arguments in inline asm, it only works in machine descriptions.  I think it
> may have changed for newer versions of gcc.
>
> I've tried it, before and gcc doesn't complain about it, but it doesn't
> always work.  With broken constraints you will often get lucky and have
> everything work, and then some random change to some peice of unrelated
> code will have the optimizer make a choice that breaks the asm.  So, it's
> very had to make a test case that shows it, but I had gcc not load the
> value into a "+r" constraint, so I decided to believe the docs and use "=r"
> / "0" instead.



More information about the MPlayer-dev-eng mailing list