[MPlayer-dev-eng] Improved remove-logo filter
Trent Piepho
xyzzy at speakeasy.org
Tue Oct 31 22:13:29 CET 2006
On Tue, 31 Oct 2006, Guillaume POIRIER wrote:
> Hi Trent,
>
> Were you able to work on improving your patch?
What was there to do? Change the asm to use "+" constraints
even though it doesn't always work with gcc 2.7.2?
>
> On 9/15/06, Trent Piepho <xyzzy at speakeasy.org> wrote:
> > On Thu, 14 Sep 2006, Loren Merritt wrote:
> > > On Thu, 14 Sep 2006, Trent Piepho wrote:
> > > > So I tried writing the inner loop (over one line) in asm so accumulator
> > > > would be kept in mm1 for the whole loop. Gcc still spills and loads
> > > > accumulator for no reason on each outer loop (for each line). This ended
> > > > up being about the same speed.
> > >
> > > Not that there's anything wrong with writing the loops in asm, but you
> > > don't have to do that just to keep the accumulator in an mmreg. "y"
> > > constraints are not needed, unless you _want_ gcc to load/spill values.
> >
> > Good idea, I hadn't thought of trying that. It only works as long as gcc
> > doesn't touch the mmx register. Which is true I think, even if you enable
> > -fmmx gcc won't generate any code that uses mmx registers unless you
> > explictly write some (with asm, builtins or vector types). If it was
> > another general purpose register, that wouldn't work.
> >
> > > Or with both loops as one asm block, you can bring back "+y"(accumulator)
> > > instead of the explicit movd.
> >
> > I had that originally, since I wrote gcc 3.1+ code with symbolic register
> > names. I translated it back to gcc 2.95 asm, and the gcc 2.95 code
> > benchmarked at the same speed. So I figured there was no point in having
> > two versions.
> >
> > > > : "=m" (accumulator), "=r" (i), "=g" (j), "=r" (mask), "=r" (image)
> > > > : "m" (accumulator), "1" (i), "2" (j), "3" (mask), "4" (image),
> > > > "g" (logo_mask->width), "g" (stride)
> > >
> > > : "+m" (accumulator), "+r" (i), "+g" (j), "+r" (mask), "+r" (image)
> > > : "g" (logo_mask->width), "g" (stride)
> >
> > I've read several places that you can't use "+" to indicate an input/output
> > arguments in inline asm, it only works in machine descriptions. I think it
> > may have changed for newer versions of gcc.
> >
> > I've tried it, before and gcc doesn't complain about it, but it doesn't
> > always work. With broken constraints you will often get lucky and have
> > everything work, and then some random change to some peice of unrelated
> > code will have the optimizer make a choice that breaks the asm. So, it's
> > very had to make a test case that shows it, but I had gcc not load the
> > value into a "+r" constraint, so I decided to believe the docs and use "=r"
> > / "0" instead.
>
More information about the MPlayer-dev-eng
mailing list