[MPlayer-dev-eng] [RFC] Color management filter

Tue Aug 26 21:33:22 CEST 2008

On Tue, Aug 26, 2008 at 09:22:34AM -0400, Yue Shi Lai wrote:
> This new patch addresses all issues have been raised (i.e. including 
> interface to vo_gl). There is still few comment gaps and comments in need 
> of update (especially where the modifications have been applied), which I 
> will fill up, but I would appreciate any other issues you might find.

Hm, I do not at all like the global variable the vo_gl support adds.
Avoiding it would be a bit of work though, e.g. setting lcms through
a global option with suboptions instead of the filter options or
something like that.

> +#include "../libavutil/avutil.h"
> +#include "../libavutil/x86_cpu.h"

without the "../", also in MPlayer code I think you are supposed to use
cpudetect.h and not libavutil/x86_cpu.h, particularly since I do not
know if you can expect the later one to always compile on non-x86.

> +static void clut_rgb2yuv(uint8_t buf_out[3], uint8_t buf_in[3],
> +						 const int bt_709)
> +{
> +	/* Dense ITU-R BT.601 RGB -> Y'CbCr conversion matrix with Kr =
> +	   0.299, Kb = 0.114 (ITU-R BT.709 specifies Kr = 0.2126, Kb =
> +	   0.0722 for HDTV, however, this is unlikely being used by any
> +	   computer video at the moment). The redundant a[5] is only
> +	   stored once, see the explicit matrix multiplication below */
> +	static const double a_bt_601[] = {
> +		21827.0 / 85000.0, 42851.0 / 85000.0, 4161.0 /42500.0,
> +		-16744.0 / 112965.0, -32872.0 / 112965.0,
> +		112.0 / 255.0, -65744.0 / 178755.0, -4256.0 / 59585.0
> +	};
> +	/* The rational expression for ITU-R BT.709 for reference */
> +	static const double a_bt_709[] = {
> +		77599.0 / 425000.0, 32631.0 / 53125.0,
> +		26353.0 / 425000.0, -119056.0 / 1182945.0,
> +		-133504.0 / 394315.0, 112.0 / 255.0,
> +		-133504.0 / 334645.0, -40432.0 / 1003935.0
> +	};

Given that it costs only 16 bytes I would go for readability and make
those arrays [3][3].
Also, in all such cases, aligning it so that the ',' and '/' are aligned
makes it nicer to look at.

> +	for(i = 0; i < count; i++) {
> +		/* The following coefficients are 8.7 fix point approximation
> +		   of: {-9/128, 387/448, 31/128, -1/28}, {-1/16, 31/56, 29/48,
> +		   -2/21}, {-3/128, 97/448, 117/128, -3/28} */

You can avoid these comments and be more flexible with e.g. macros:
#define FP_BITS 7
#define FP_FRAC(a, b) ((int)((double)a / (double)b * (1 << FP_BITS)))

> +		pclut[stride] = av_clip_uint8(
> +			(111 * pclut[0] - 9 * pclut[-4 * stride] +
> +			 31 * pclut[4 * stride] -
> +			 5 * pclut[7 * stride]) >> 7);

Becomes
pclut[stride] = av_clip_uint8((FP_FRAC(9, 128) * pclut[0] + ...) >> FP_BITS);

> +		pclut[2 * stride] = av_clip_uint8(
> +			(71 * pclut[0] - (pclut[-4 * stride] << 3) +

I'd really leave the *8 -> << 3 conversion to the compiler, contrary
to division it should be able to handle it.

> +static inline void interpolate_left_sse2(
> +	uint8_t *pclut, const size_t stride, const size_t count)
> +{
> +	/* Coefficients */
> +	/* These constant are in the format n + (n << 16), which when
> +	   loaded, avoids an additional pshuflw (i.e. before pshufd) to
> +	   generate the correct vector layout. */
> +	static uint32_t factor[] = {
> +		/* For pass 1 */
> +		3 + (3 << 16), 28 + (28 << 16), 33 + (33 << 16),
> +		/* For pass 2 */
> +		7 + (7 << 16), 60 + (60 << 16)

#define DUP_16BIT(a) (a * 0x10001)
and use that.
Also that should be const (though you will have to fix the asm
constraints for that I think).

> +	/* The left edge requires 2 passes on IA-32, since there are not
> +	   sufficient %xmm registers to store 2 sets of coefficients. */
> +	/* Aligned body, 1st pass */
> +	asm (
> +		/* PASS 1 */
> +		/* Load 3 into %xmm5 */
> +		"movd	%5, %%xmm5\n\t"
> +		"pshufd	$0x0, %%xmm5, %%xmm5\n\t"
> +		/* Load 28 into %xmm6 */
> +		"movd	%6, %%xmm6\n\t"
> +		"pshufd	$0x0, %%xmm6, %%xmm6\n\t"
> +		/* Load 33 into %xmm7 */
> +		"movd	%7, %%xmm7\n\t"
> +		"pshufd	$0x0, %%xmm7, %%xmm7\n\t"
> +		/* Zero %xmm0 (for byte to word conversion using punpcklbw) */
> +		"pxor	%%xmm0, %%xmm0\n\t"
> +		/* Loop initialization */
> +		/* Load %[er]di = pclut */
> +		"mov	%10, %%" REG_d "\n\t"
> +		/* Load %[er]cx = count as the near loop counter */
> +		"mov	%12, %%" REG_c "\n"
> +		/* Begin of loop */
> +		"1:\n\t"

IMO, please put the comments aligned to the right of the code.
As they are now I think they make it harder to read and understand than
if there were no comments at all.
And try to align the asm code, too, i.e.

> +             "punpcklbw      %%xmm0, %%xmm1\n\t"
> +             /* Save pclut[4 * stride] */
> +             "movdqa %%xmm1, %%xmm4\n\t"
> +		/* Now %xmm1 = %xmm4 = pclut[4 * stride] */

becomes

> +             "punpcklbw  %%xmm0, %%xmm1  \n\t"      // Save pclut[4 * stride]
> +             "movdqa     %%xmm1, %%xmm4  \n\t"      // %xmm1 = %xmm4 = pclut[4 * stride]

(though I would not really even comment a movdqa).

> +/**
> + * Returns the string representation of the MPlayer pixel format fmt
> + *
> + * @param[in] fmt MPlayer pixel format
> + * @return string representation of fmt
> + */
> +static char *format_name(const unsigned int fmt)
> +{
> +	return (fmt == IMGFMT_YV12 ? "IMGFMT_YV12" :
> +			fmt == IMGFMT_I420 ? "IMGFMT_I420" :
> +			fmt == IMGFMT_IYUV ? "IMGFMT_IYUV" :
> +			fmt == IMGFMT_BGR24 ? "IMGFMT_BGR24" :
> +			"other");
> +}

vo_format_name from imgformat.c

> +static char *primary_name(const unsigned int primary)

And const in function arguments passed by value is usually pointless
and IMO only makes changing/improving the code harder.

> +	return (primary == PRIMARY_SMPTE ? "SMPTE RP-145" :
> +			primary == PRIMARY_EBU ? "EBU 3213" :
> +			primary == PRIMARY_REC709 ? "ITU-R BT.709" :
> +			"other");

A switch () { case ...: } would be nicer I think.
Also, your return type is not really right, it should be "const char *".

> +	/* The coefficients a, b, c are modulo the determinant, since the
> +	   an overall scale is not needed when calculating the root or the
> +	   extremum. */
> +	a = sum_x * (sum_x_3 * sum_y + sum_x_2 * sum_x_y -
> +				 sum_x * sum_x_2_y) +
> +		n * (sum_x_2 * sum_x_2_y - sum_x_3 * sum_x_y) -
> +		sum_x_2 * sum_x_2 * sum_y;
> +	b = sum_x * (sum_x_2 * sum_x_2_y - sum_x_4 * sum_y) +
> +		n * (sum_x_4 * sum_x_y - sum_x_3 * sum_x_2_y) +
> +		sum_x_2 * (sum_x_3 * sum_y - sum_x_2 * sum_x_y);
> +	c = sum_x_3 * (sum_x * sum_x_2_y + sum_x_2 * sum_x_y -
> +				   sum_x_3 * sum_y) +
> +		sum_x_2 * (sum_x_4 * sum_y - sum_x_2 * sum_x_2_y) -
> +		sum_x * sum_x_4 * sum_x_y;

This is really hard to read (possibly also due to tabs)

> +	discriminant = b * b - 4.0 * a * c;
> +	if(discriminant > 0) {
> +		/* The quadratic function has two roots, with the larger root
> +		   being the black point. In the original Ad*be
> +		   implementation, this case was incorrectly left out. */
> +		const double sign_b = b > 0.0 ? 1.0 : b < 0.0 ? -1.0 : 0.0;
> +		const double q = -0.5 * (b + sign_b * sqrt(discriminant));

That's just wrong for b == 0. FFSIGN(b) probably is right.

> +static int put_image(struct vf_instance_s *vf, mp_image_t *mpi,
> +					 double pts)
> +{
> +	static const size_t clut_resolution = 1 << 8;
> +	struct vf_priv_s *p = vf->priv;
> +	/* Pixel format specific pointers */
> +	uint8_t *y_in, *cb_in, *cr_in, *y_out, *cb_out, *cr_out;
> +	uint8_t *bgr_out;
> +	int i;
> +
> +	if(g_cm.vo_yuv2rgb_clut != 0)
> +		/* The video output plugin is performing the CLUT lookup for
> +		   us. */
> +		return vf_next_put_image(vf, mpi, pts);

No idea why it works, but I do not think this is valid. vf_get_image
must always be called. You can avoid the overhead by implementing
direct-rendering via a get_image function, but a solution that avoids
the need to use this filter with -vo gl might be nicer in all aspects
anyway.

> +static void uninit(struct vf_instance_s *vf)
> +{
> +	if(vf->priv->bpc == BPC_MA)
> +		lcms_abpc_delete_transform(vf->priv->lcms_abpc_transform);
> +	else if(vf->priv->lcms_transform != NULL)
> +		cmsDeleteTransform(vf->priv->lcms_transform);
> +	if(vf->priv->lcms_src_profile != NULL)
> +		cmsCloseProfile(vf->priv->lcms_src_profile);
> +	if(vf->priv->lcms_src_profile != NULL)
> +		cmsCloseProfile(vf->priv->lcms_dest_profile);
> +	if(vf->priv->lcms_src_profile != NULL)
> +		cmsCloseProfile(vf->priv->lcms_proof_profile);

I think in MPlayer code it is more common to not use the != NULL and I
also consider it more readable, but it's your decision.

> +static const char *parse_args(struct vf_priv_s *p, const char *args)
> +{
> +	char *buf, *argsdup, *pargs;
> +	int i, k, len;
> +
> +	len = strlen(args);
> +	if((buf = (char *)malloc(len * sizeof(char))) == NULL)
> +		return NULL;
> +	if((argsdup = strdup(args)) == NULL) {
> +		free(buf);
> +		return NULL;
> +	}
> +
> +	/* For sscanf based parsing: find the separation between arguments
> +	   (regular expression: ":(?=[^=]*=)") and replace it with a
> +	   space. */

Maybe it did not exist when you started writing this filter, but both
the main MPlayer option parser as well as the code in subopt-helper.c
(which e.g. vo gl uses) would be much easier to use (and avoid the
strdup in addition ;-) ).

Greetings,
Reimar Döffinger