[MPlayer-dev-eng] [PATCH] VF Overlay

Sat Aug 8 16:02:58 CEST 2009

On Sat, 2009-08-08 at 15:03 +0200, Benjamin Zores wrote:
> +    for (i = priv->w * rh; i > 0; i--, p_bgr32 += 4, p_alpha++, p_bgr24 += 3) {
> +        /* Moving 32 bits is faster than 3 separate assignments (or one 16
> +         * bit and and one 8 bit move). The BGR24 buffer has one extra byte
> +         * allocated to prevent an overrun.
> +         */
> +        memcpy(p_bgr24, p_bgr32, 3);
> +        *p_alpha = p_bgr32[3];
> +    }

The original code was fairly reasonably benchmarked by me, as this code
path is well travelled and should be as fast as possible.  I suppose for
the sake of just getting it into MPlayer we can worry about this later,
quite a bit slower than even copying 3 bytes separately.

When you get a chance, have a look at the AV_WN32 macro as Reimar
suggested.

I guess it would look like:

   AV_WN32(p_bgr24, p_bgr32);
   *p_alpha = p_bgr32[3];

> +        "movq      %%mm0, (%0)                        \n\t"
> +        :  "+&r" (dst_byte), "+r" (dst_alpha)
> +        :  "r" (byte), "r" (alpha), "r" (global_alpha)
> +    );

Sorry, I got this backwards in my last email: dst_byte is written after
all inputs read, whereas dst_alpha isn't, so early clobber should go on
dst_alpha.  Reimar, is that right?

> +static void blend_plane_MMX(int w, int slice_h, uint8_t *dst, uint8_t *src,
[...]
> +                : "+r" (dst), "+r" (src), "+r" (overlay), "+r" (alpha)
> +                : "m" (wr)

Reimar pointed out that all the above "+r" might need to be changed to
"+&r" as well.  Since wr is read last, after all outputs are written, I
assume that's true.

> +static int control(struct vf_instance_s *vf, int request, void *data)
> +{
> +    return vf_next_control(vf, request, data);
> +}

Now that this function does nothing, you can just remove it entirely
(and the reference to it in open()).

Thanks,
Jason.