SwScaler performance help (was Re: [MPlayer-dev-eng] [PATCH] vf_osd updates - fully baked?)
Jason Tackaberry
tack at sault.org
Tue Sep 13 04:34:58 CEST 2005
On Mon, 2005-09-12 at 11:59 -0400, Jason Tackaberry wrote:
> > As I mentioned before, since you have to seperate out the alpha in an
> > extra plane anyway, you can first do that and then scale. I think. Btw.
> > the swscaler can do the conversion and scaling in one step AFAIK.
>
> It does. I'll try to rework the code to use Swscaler. I agree that
> it's just a better design that way. I may have to ask for help. :)
Initial results are not very encouraging. This approach, using
swscaler, is nearly 3 times slower than my current code. My code will
convert a 640x480 BGRA image to 5 planes (luma, 2 chroma, luma alpha,
chroma alpha) in about 4200 usec. Using swscaler to convert BGR32 to
YV12, then separating the alpha channel to a separate plane and using
swscaler to scale Y800 for luma and chroma alpha, this takes about 11500
usec.
Here's the code I'm using for swscaler. In vf_config:
sws_getFlagsAndFilterFromCmdLine(&sws_flags, &srcFilterParam,
&dstFilterParam);
priv->sws_bgr32 = sws_getContext(priv->w, priv->h, IMGFMT_BGR32, width, height, IMGFMT_YV12,
get_sws_cpuflags() | sws_flags | SWS_PRINT_INFO,
srcFilterParam, dstFilterParam, NULL);
priv->sws_y800_l = sws_getContext(priv->w, priv->h, IMGFMT_Y800, width, height, IMGFMT_Y800,
get_sws_cpuflags() | sws_flags | SWS_PRINT_INFO,
srcFilterParam, dstFilterParam, NULL);
priv->sws_y800_c = sws_getContext(priv->w, priv->h, IMGFMT_Y800, width>>1, height>>1, IMGFMT_Y800,
get_sws_cpuflags() | sws_flags | SWS_PRINT_INFO,
srcFilterParam, dstFilterParam, NULL);
Note that I'm testing with a fixed OSD, so that means priv->w == width
and priv->h == height. (In other words, no scaling is happening except
for sws_y800_c.)
And for the conversion (it's messy, but it's just test code):
unsigned char *alpha = malloc(priv->w*priv->h);
int i, j;
for (i=3, j=0; i < priv->w * priv->h * 4; i+=4, j++)
alpha[j] = priv->bgra_imgbuf[i];
{
uint8_t *src[3] = {priv->bgra_imgbuf, NULL, NULL};
int src_strides[3] = {priv->w * 4, 0, 0};
uint8_t *dst[3] = {priv->y, priv->u, priv->v};
int dst_strides[3] = {priv->mpi_w, priv->mpi_w>>1, priv->mpi_w>>1};
sws_scale_ordered(priv->sws_bgr32, src, src_strides, 0, priv->h, dst, dst_strides);
}
uint8_t *src[3] = {alpha, NULL, NULL};
int src_strides[3] = {priv->w, 0, 0};
{
uint8_t *dst[3] = {priv->a, NULL, NULL};
int dst_strides[3] = {priv->w, 0, 0};
sws_scale_ordered(priv->sws_y800_l, src, src_strides, 0, priv->h, dst, dst_strides);
}
{
uint8_t *dst[3] = {priv->uva, NULL, NULL};
int dst_strides[3] = {priv->w>>1, 0, 0};
sws_scale_ordered(priv->sws_y800_c, src, src_strides, 0, priv->h, dst, dst_strides);
}
free(alpha);
(Note the malloc/free isn't being included in the timings since it should be moved elsewhere.)
Here's the info messages from swscaler:
SwScaler: using unscaled Planar YV12 -> Planar YV12 special converter
SwScaler: BICUBIC scaler, from Planar YV12 to Planar YV12 using MMX2
SwScaler: BICUBIC scaler, from BGRA to Planar YV12 using MMX2
SwScaler: using unscaled Planar Y800 -> Planar Y800 special converter
SwScaler: BICUBIC scaler, from Planar Y800 to Planar Y800 using MMX2
(Note that I've aligned bgra_imgbuf.)
An increase from 4200 usec to 11500 usec is no small potatoes. Am I
doing anything wrong? I must be. When I comment out the two last
scales and just do BGR32 to YV12, it's still slower (about 8000 usec).
I would have expected swscaler to be faster.
Cheers,
Jason.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 229 bytes
Desc: This is a digitally signed message part
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20050912/1a930f5e/attachment.pgp>
More information about the MPlayer-dev-eng
mailing list