[MPlayer-dev-eng] Improved remove-logo filter
Rich Felker
dalias at aerifal.cx
Wed Sep 13 17:54:45 CEST 2006
On Tue, Sep 12, 2006 at 03:14:27PM -0700, Trent Piepho wrote:
> On Tue, 12 Sep 2006, Rich Felker wrote:
> > On Mon, Sep 11, 2006 at 06:40:40PM -0700, Trent Piepho wrote:
> > > The existing remove-logo filter has several serious bugs. For example,
> > > this logo mask http://www.speakeasy.org/~xyzzy/pictures/mask.png produced
> > > this output http://www.speakeasy.org/~xyzzy/pictures/old_remove_logo.jpg
> > > The improved filter with bugs fixed produces this,
> > > http://www.speakeasy.org/~xyzzy/pictures/new_remove_logo.jpg
> > >
> > > I further optimized the code by writing a MMX2 mask and sum core, that
> > > provides an additional speedup of about 60% for the blurring operation.
> > > There is of course a C version if MMX2 is not enabled.
> >
> > Your inline asm constraints look wrong (passing in a 64bit int via
> > register??) and you use some _-prefixed variable names which I believe
> > are reserved by C. Idea sounds good tho.
>
> The reason I had _-prefixed variable names no longer applies, so I will
> change those.
>
> The constraints work fine. The 'y' constraint is for an MMX register, so
> gcc will place the 64 bit data into an MMX register it chooses. This way
> gcc has more options of where to put the movq instruction wrt the rest of
> the loop code. The actual code generated:
This is gcc3/4-specific code... MPlayer does not require gcc3/4.
Instead pass address constraints and load the mmx registers...
> If I had used a "m" constraint and moved the data into mmx registers
> myself, then it could not be mixed in with the loop counter instructions
> for better scheduling.
This means you should write the loop in asm. You'll go better than gcc
ever could anyway.
Rich
More information about the MPlayer-dev-eng
mailing list