[MPlayer-cvslog] r35669 - trunk/libmpcodecs/vf_ass.c

Reimar Döffinger Reimar.Doeffinger at gmx.de
Wed Dec 12 19:43:50 CET 2012

On Wed, Dec 12, 2012 at 06:13:44PM +0100, upsuper wrote:
> +DECLARE_ASM_CONST(16, uint32_t, SSE_32BIT_80H[4]) = { [0 ... 3] = 0x80 };
> +DECLARE_ASM_CONST(16, uint32_t, SSE_32BIT_MAP[4]) = { [0 ... 3] = 0x102 };

All-uppercase names generally and in MPlayer code are considered
reserved for macros/defines.
Also I am quite certain that the [0 ... 3] syntax is a GNU extension,
it really shouldn't be used in MPlayer ever (also not in switch/cases).
I don't really think it improves readability in this case anyway.

> +    CLEAN_XMM(7);

A macro for one single use seems like overkill.
A more serious (though mostly theoretical) issue is
that there's no guarantee that the xmm7 register will
still be clear by the time you get into the next asm block.
Which is among the reasons why implementing all loops in asm
tends to be a good idea.

> +                : : [dst]   "r" (dst + i * stride),
> +                    [alpha] "r" (alpha + i * outw),
> +                    [src_y] "r" (src_y + i * outw),
> +                    [src_u] "r" (src_u + i * outw),
> +                    [src_v] "r" (src_v + i * outw),
> +                    [j]     "r" (xmin),
> +                    [xmax]  "g" (xmax),
> +                    [f]     "r" (is_uyvy)

This requires 7-8 registers.
You'll have to add a check that that many are actually available.
See HAVE_EBX_AVAILABLE and HAVE_EBP_AVAILABLE, on 32 bit your code
will need both available.

> +#if HAVE_SSE4
> +        if (gCpuCaps.hasSSE4 && outw % 8 == 0)
> +            vf->priv->render_frame = render_frame_yuv422_sse4;

The width check is overkill, first having a C part to
do the remainder is a good idea.
However in this specific case, it is possible to simply
make sure all buffers have enough extra padding that it
does not matter if you run over the end of a line, then
you can round the with up in the asm code.
Also it would be nice to have a more baseline requirement
than SSE4, I would assume SSE4 instructions or even SSE3
are not really necessary to still get very close to the
same speedup.

More information about the MPlayer-cvslog mailing list