SwScaler performance help (was Re: [MPlayer-dev-eng] [PATCH] vf_osd updates - fully baked?)

Jason Tackaberry tack at sault.org
Tue Sep 13 16:24:24 CEST 2005


On Tue, 2005-09-13 at 10:47 +0200, Reimar Döffinger wrote:
> Btw. this function isn't in any header, so don't use it in the final
> version (use sws_getContextFromCmdLine), though I'm not sure if you
> really want to respect the -sws command line option.

True, I suppose it makes most sense to specify which scalers to use
explicitly.

> >         SwScaler: using unscaled Planar YV12 -> Planar YV12 special converter
> >         
> >         SwScaler: BICUBIC scaler, from Planar YV12 to Planar YV12 using MMX2
> 
> I wonder where these come from?

Sorry, these came from previous parts in the filter chain.

> >         SwScaler: BICUBIC scaler, from BGRA to Planar YV12 using MMX2
> 
> Why does it get scaled? 

I noticed that too and did double check that the src/dest sizes were the
same.  From what I can tell, the reason is that there's no unscaled
special converter from BGR32->YV12, according to swscale.c.

> And BICUBIC is probably overkill, see -sws option.

True.  Using SWS_FAST_BILINEAR drops the time to about 9500 usec.

> btw.2: since the scaler supports strides, it is possible to convert only a
> rectangle by setting up a new context and doing a bit of pointer
> arithmetic.

Wouldn't I need to setup a new context for each invalidate command?  It
would probably be better to keep the existing context and convert a
slice, rather than setting up a new temporary context in order to do a
specific rectangle.  I just checked, and the overhead of making 3 new
swscalers and freeing them is about 1000 usec, which is enough to be
concerned about.

With SWS_FAST_BILINEAR, and using existing contexts to convert slices,
for my test of using mplayer to play a movie to the OSD of another
mplayer (scaled to 350x152 on a 640x480 OSD), it takes about 3000 usec
to convert each frame, with a total cpu usage (with both movies playing)
at 65%.

With my current non-swscaler code, it takes about 900 usec to convert
each frame, with total cpu usage (with both movies playing) at 58%.

So there is a performance hit here.  There might be some other
improvements I can make.  For example, about 25-30% of that time is
spent putting the alpha channel from the BGRA image into a separate
plane.  Maybe that can be accelerated somehow?

Is the internal colorspace conversion that bad an idea?  It's certainly
not unprecedented in mplayer.  On the other hand, I understand there are
benefits to using swscaler here (simplifying code, taking advantage of
future improvements in swscaler, etc.).   I'll yield to whatever
approach has the best chance of getting this patch merged.  

Cheers,
Jason.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 229 bytes
Desc: This is a digitally signed message part
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20050913/523c98c4/attachment.pgp>


More information about the MPlayer-dev-eng mailing list