[MPlayer-dev-eng] Improved remove-logo filter

Trent Piepho xyzzy at speakeasy.org
Sat Sep 16 00:27:40 CEST 2006


On Thu, 14 Sep 2006, Loren Merritt wrote:
> On Thu, 14 Sep 2006, Trent Piepho wrote:
> > So I tried writing the inner loop (over one line) in asm so accumulator
> > would be kept in mm1 for the whole loop.  Gcc still spills and loads
> > accumulator for no reason on each outer loop (for each line).  This ended
> > up being about the same speed.
>
> Not that there's anything wrong with writing the loops in asm, but you
> don't have to do that just to keep the accumulator in an mmreg. "y"
> constraints are not needed, unless you _want_ gcc to load/spill values.

Good idea, I hadn't thought of trying that.  It only works as long as gcc
doesn't touch the mmx register.  Which is true I think, even if you enable
-fmmx gcc won't generate any code that uses mmx registers unless you
explictly write some (with asm, builtins or vector types).  If it was
another general purpose register, that wouldn't work.

> Or with both loops as one asm block, you can bring back "+y"(accumulator)
> instead of the explicit movd.

I had that originally, since I wrote gcc 3.1+ code with symbolic register
names.  I translated it back to gcc 2.95 asm, and the gcc 2.95 code
benchmarked at the same speed.  So I figured there was no point in having
two versions.

> > : "=m" (accumulator), "=r" (i), "=g" (j), "=r" (mask), "=r" (image)
> > : "m" (accumulator), "1" (i), "2" (j), "3" (mask), "4" (image),
> >   "g" (logo_mask->width), "g" (stride)
>
>    : "+m" (accumulator), "+r" (i), "+g" (j), "+r" (mask), "+r" (image)
>    : "g" (logo_mask->width), "g" (stride)

I've read several places that you can't use "+" to indicate an input/output
arguments in inline asm, it only works in machine descriptions.  I think it
may have changed for newer versions of gcc.

I've tried it, before and gcc doesn't complain about it, but it doesn't
always work.  With broken constraints you will often get lucky and have
everything work, and then some random change to some peice of unrelated
code will have the optimizer make a choice that breaks the asm.  So, it's
very had to make a test case that shows it, but I had gcc not load the
value into a "+r" constraint, so I decided to believe the docs and use "=r"
/ "0" instead.



More information about the MPlayer-dev-eng mailing list