[MPlayer-dev-eng] Improved remove-logo filter

Rich Felker dalias at aerifal.cx
Wed Sep 13 17:54:45 CEST 2006


On Tue, Sep 12, 2006 at 03:14:27PM -0700, Trent Piepho wrote:
> On Tue, 12 Sep 2006, Rich Felker wrote:
> > On Mon, Sep 11, 2006 at 06:40:40PM -0700, Trent Piepho wrote:
> > > The existing remove-logo filter has several serious bugs.  For example,
> > > this logo mask http://www.speakeasy.org/~xyzzy/pictures/mask.png produced
> > > this output http://www.speakeasy.org/~xyzzy/pictures/old_remove_logo.jpg
> > > The improved filter with bugs fixed produces this,
> > > http://www.speakeasy.org/~xyzzy/pictures/new_remove_logo.jpg
> > >
> > > I further optimized the code by writing a MMX2 mask and sum core, that
> > > provides an additional speedup of about 60% for the blurring operation.
> > > There is of course a C version if MMX2 is not enabled.
> >
> > Your inline asm constraints look wrong (passing in a 64bit int via
> > register??) and you use some _-prefixed variable names which I believe
> > are reserved by C. Idea sounds good tho.
> 
> The reason I had _-prefixed variable names no longer applies, so I will
> change those.
> 
> The constraints work fine.  The 'y' constraint is for an MMX register, so
> gcc will place the 64 bit data into an MMX register it chooses.  This way
> gcc has more options of where to put the movq instruction wrt the rest of
> the loop code.  The actual code generated:

This is gcc3/4-specific code... MPlayer does not require gcc3/4.
Instead pass address constraints and load the mmx registers...

> If I had used a "m" constraint and moved the data into mmx registers
> myself, then it could not be mixed in with the loop counter instructions
> for better scheduling.

This means you should write the loop in asm. You'll go better than gcc
ever could anyway.

Rich




More information about the MPlayer-dev-eng mailing list