SwScaler performance help (was Re: [MPlayer-dev-eng] [PATCH] vf_osd updates - fully baked?)
Michael Niedermayer
michaelni at gmx.at
Tue Sep 13 18:51:35 CEST 2005
Hi
On Mon, Sep 12, 2005 at 10:34:58PM -0400, Jason Tackaberry wrote:
> On Mon, 2005-09-12 at 11:59 -0400, Jason Tackaberry wrote:
> > > As I mentioned before, since you have to seperate out the alpha in an
> > > extra plane anyway, you can first do that and then scale. I think. Btw.
> > > the swscaler can do the conversion and scaling in one step AFAIK.
> >
> > It does. I'll try to rework the code to use Swscaler. I agree that
> > it's just a better design that way. I may have to ask for help. :)
>
> Initial results are not very encouraging. This approach, using
> swscaler, is nearly 3 times slower than my current code. My code will
> convert a 640x480 BGRA image to 5 planes (luma, 2 chroma, luma alpha,
> chroma alpha) in about 4200 usec. Using swscaler to convert BGR32 to
> YV12, then separating the alpha channel to a separate plane and using
> swscaler to scale Y800 for luma and chroma alpha, this takes about 11500
> usec.
>
> Here's the code I'm using for swscaler. In vf_config:
>
> sws_getFlagsAndFilterFromCmdLine(&sws_flags, &srcFilterParam,
> &dstFilterParam);
> priv->sws_bgr32 = sws_getContext(priv->w, priv->h, IMGFMT_BGR32, width, height, IMGFMT_YV12,
> get_sws_cpuflags() | sws_flags | SWS_PRINT_INFO,
> srcFilterParam, dstFilterParam, NULL);
> priv->sws_y800_l = sws_getContext(priv->w, priv->h, IMGFMT_Y800, width, height, IMGFMT_Y800,
> get_sws_cpuflags() | sws_flags | SWS_PRINT_INFO,
> srcFilterParam, dstFilterParam, NULL);
> priv->sws_y800_c = sws_getContext(priv->w, priv->h, IMGFMT_Y800, width>>1, height>>1, IMGFMT_Y800,
> get_sws_cpuflags() | sws_flags | SWS_PRINT_INFO,
> srcFilterParam, dstFilterParam, NULL);
>
> Note that I'm testing with a fixed OSD, so that means priv->w == width
> and priv->h == height. (In other words, no scaling is happening except
> for sws_y800_c.)
>
> And for the conversion (it's messy, but it's just test code):
>
> unsigned char *alpha = malloc(priv->w*priv->h);
> int i, j;
> for (i=3, j=0; i < priv->w * priv->h * 4; i+=4, j++)
> alpha[j] = priv->bgra_imgbuf[i];
> {
> uint8_t *src[3] = {priv->bgra_imgbuf, NULL, NULL};
> int src_strides[3] = {priv->w * 4, 0, 0};
> uint8_t *dst[3] = {priv->y, priv->u, priv->v};
> int dst_strides[3] = {priv->mpi_w, priv->mpi_w>>1, priv->mpi_w>>1};
> sws_scale_ordered(priv->sws_bgr32, src, src_strides, 0, priv->h, dst, dst_strides);
> }
> uint8_t *src[3] = {alpha, NULL, NULL};
> int src_strides[3] = {priv->w, 0, 0};
> {
> uint8_t *dst[3] = {priv->a, NULL, NULL};
> int dst_strides[3] = {priv->w, 0, 0};
> sws_scale_ordered(priv->sws_y800_l, src, src_strides, 0, priv->h, dst, dst_strides);
> }
> {
> uint8_t *dst[3] = {priv->uva, NULL, NULL};
> int dst_strides[3] = {priv->w>>1, 0, 0};
> sws_scale_ordered(priv->sws_y800_c, src, src_strides, 0, priv->h, dst, dst_strides);
> }
> free(alpha);
>
> (Note the malloc/free isn't being included in the timings since it should be moved elsewhere.)
>
> Here's the info messages from swscaler:
>
> SwScaler: using unscaled Planar YV12 -> Planar YV12 special converter
>
> SwScaler: BICUBIC scaler, from Planar YV12 to Planar YV12 using MMX2
>
> SwScaler: BICUBIC scaler, from BGRA to Planar YV12 using MMX2
> SwScaler: using unscaled Planar Y800 -> Planar Y800 special converter
>
> SwScaler: BICUBIC scaler, from Planar Y800 to Planar Y800 using MMX2
>
> (Note that I've aligned bgra_imgbuf.)
>
> An increase from 4200 usec to 11500 usec is no small potatoes. Am I
> doing anything wrong? I must be. When I comment out the two last
> scales and just do BGR32 to YV12, it's still slower (about 8000 usec).
> I would have expected swscaler to be faster.
bgr32->yv12 sws doesnt seem to be optimized at all, its uncommon for playback
of some codecs to require bgr32->yv12 conversation
i really think you should add your code to the swscaler, and if its really to
hard then at least put it in postproc/... a filter is not the correct place
for it
btw, ensure that all arrays are aliged at 16byte boundaries and linesizes/strides too
[...]
--
Michael
More information about the MPlayer-dev-eng
mailing list