[MPlayer-dev-eng] [RFC] Color management filter
Reimar Döffinger
Reimar.Doeffinger at stud.uni-karlsruhe.de
Tue Aug 26 21:33:22 CEST 2008
On Tue, Aug 26, 2008 at 09:22:34AM -0400, Yue Shi Lai wrote:
> This new patch addresses all issues have been raised (i.e. including
> interface to vo_gl). There is still few comment gaps and comments in need
> of update (especially where the modifications have been applied), which I
> will fill up, but I would appreciate any other issues you might find.
Hm, I do not at all like the global variable the vo_gl support adds.
Avoiding it would be a bit of work though, e.g. setting lcms through
a global option with suboptions instead of the filter options or
something like that.
> +#include "../libavutil/avutil.h"
> +#include "../libavutil/x86_cpu.h"
without the "../", also in MPlayer code I think you are supposed to use
cpudetect.h and not libavutil/x86_cpu.h, particularly since I do not
know if you can expect the later one to always compile on non-x86.
> +static void clut_rgb2yuv(uint8_t buf_out[3], uint8_t buf_in[3],
> + const int bt_709)
> +{
> + /* Dense ITU-R BT.601 RGB -> Y'CbCr conversion matrix with Kr =
> + 0.299, Kb = 0.114 (ITU-R BT.709 specifies Kr = 0.2126, Kb =
> + 0.0722 for HDTV, however, this is unlikely being used by any
> + computer video at the moment). The redundant a[5] is only
> + stored once, see the explicit matrix multiplication below */
> + static const double a_bt_601[] = {
> + 21827.0 / 85000.0, 42851.0 / 85000.0, 4161.0 /42500.0,
> + -16744.0 / 112965.0, -32872.0 / 112965.0,
> + 112.0 / 255.0, -65744.0 / 178755.0, -4256.0 / 59585.0
> + };
> + /* The rational expression for ITU-R BT.709 for reference */
> + static const double a_bt_709[] = {
> + 77599.0 / 425000.0, 32631.0 / 53125.0,
> + 26353.0 / 425000.0, -119056.0 / 1182945.0,
> + -133504.0 / 394315.0, 112.0 / 255.0,
> + -133504.0 / 334645.0, -40432.0 / 1003935.0
> + };
Given that it costs only 16 bytes I would go for readability and make
those arrays [3][3].
Also, in all such cases, aligning it so that the ',' and '/' are aligned
makes it nicer to look at.
> + for(i = 0; i < count; i++) {
> + /* The following coefficients are 8.7 fix point approximation
> + of: {-9/128, 387/448, 31/128, -1/28}, {-1/16, 31/56, 29/48,
> + -2/21}, {-3/128, 97/448, 117/128, -3/28} */
You can avoid these comments and be more flexible with e.g. macros:
#define FP_BITS 7
#define FP_FRAC(a, b) ((int)((double)a / (double)b * (1 << FP_BITS)))
> + pclut[stride] = av_clip_uint8(
> + (111 * pclut[0] - 9 * pclut[-4 * stride] +
> + 31 * pclut[4 * stride] -
> + 5 * pclut[7 * stride]) >> 7);
Becomes
pclut[stride] = av_clip_uint8((FP_FRAC(9, 128) * pclut[0] + ...) >> FP_BITS);
> + pclut[2 * stride] = av_clip_uint8(
> + (71 * pclut[0] - (pclut[-4 * stride] << 3) +
I'd really leave the *8 -> << 3 conversion to the compiler, contrary
to division it should be able to handle it.
> +static inline void interpolate_left_sse2(
> + uint8_t *pclut, const size_t stride, const size_t count)
> +{
> + /* Coefficients */
> + /* These constant are in the format n + (n << 16), which when
> + loaded, avoids an additional pshuflw (i.e. before pshufd) to
> + generate the correct vector layout. */
> + static uint32_t factor[] = {
> + /* For pass 1 */
> + 3 + (3 << 16), 28 + (28 << 16), 33 + (33 << 16),
> + /* For pass 2 */
> + 7 + (7 << 16), 60 + (60 << 16)
#define DUP_16BIT(a) (a * 0x10001)
and use that.
Also that should be const (though you will have to fix the asm
constraints for that I think).
> + /* The left edge requires 2 passes on IA-32, since there are not
> + sufficient %xmm registers to store 2 sets of coefficients. */
> + /* Aligned body, 1st pass */
> + asm (
> + /* PASS 1 */
> + /* Load 3 into %xmm5 */
> + "movd %5, %%xmm5\n\t"
> + "pshufd $0x0, %%xmm5, %%xmm5\n\t"
> + /* Load 28 into %xmm6 */
> + "movd %6, %%xmm6\n\t"
> + "pshufd $0x0, %%xmm6, %%xmm6\n\t"
> + /* Load 33 into %xmm7 */
> + "movd %7, %%xmm7\n\t"
> + "pshufd $0x0, %%xmm7, %%xmm7\n\t"
> + /* Zero %xmm0 (for byte to word conversion using punpcklbw) */
> + "pxor %%xmm0, %%xmm0\n\t"
> + /* Loop initialization */
> + /* Load %[er]di = pclut */
> + "mov %10, %%" REG_d "\n\t"
> + /* Load %[er]cx = count as the near loop counter */
> + "mov %12, %%" REG_c "\n"
> + /* Begin of loop */
> + "1:\n\t"
IMO, please put the comments aligned to the right of the code.
As they are now I think they make it harder to read and understand than
if there were no comments at all.
And try to align the asm code, too, i.e.
> + "punpcklbw %%xmm0, %%xmm1\n\t"
> + /* Save pclut[4 * stride] */
> + "movdqa %%xmm1, %%xmm4\n\t"
> + /* Now %xmm1 = %xmm4 = pclut[4 * stride] */
becomes
> + "punpcklbw %%xmm0, %%xmm1 \n\t" // Save pclut[4 * stride]
> + "movdqa %%xmm1, %%xmm4 \n\t" // %xmm1 = %xmm4 = pclut[4 * stride]
(though I would not really even comment a movdqa).
> +/**
> + * Returns the string representation of the MPlayer pixel format fmt
> + *
> + * @param[in] fmt MPlayer pixel format
> + * @return string representation of fmt
> + */
> +static char *format_name(const unsigned int fmt)
> +{
> + return (fmt == IMGFMT_YV12 ? "IMGFMT_YV12" :
> + fmt == IMGFMT_I420 ? "IMGFMT_I420" :
> + fmt == IMGFMT_IYUV ? "IMGFMT_IYUV" :
> + fmt == IMGFMT_BGR24 ? "IMGFMT_BGR24" :
> + "other");
> +}
vo_format_name from imgformat.c
> +static char *primary_name(const unsigned int primary)
And const in function arguments passed by value is usually pointless
and IMO only makes changing/improving the code harder.
> + return (primary == PRIMARY_SMPTE ? "SMPTE RP-145" :
> + primary == PRIMARY_EBU ? "EBU 3213" :
> + primary == PRIMARY_REC709 ? "ITU-R BT.709" :
> + "other");
A switch () { case ...: } would be nicer I think.
Also, your return type is not really right, it should be "const char *".
> + /* The coefficients a, b, c are modulo the determinant, since the
> + an overall scale is not needed when calculating the root or the
> + extremum. */
> + a = sum_x * (sum_x_3 * sum_y + sum_x_2 * sum_x_y -
> + sum_x * sum_x_2_y) +
> + n * (sum_x_2 * sum_x_2_y - sum_x_3 * sum_x_y) -
> + sum_x_2 * sum_x_2 * sum_y;
> + b = sum_x * (sum_x_2 * sum_x_2_y - sum_x_4 * sum_y) +
> + n * (sum_x_4 * sum_x_y - sum_x_3 * sum_x_2_y) +
> + sum_x_2 * (sum_x_3 * sum_y - sum_x_2 * sum_x_y);
> + c = sum_x_3 * (sum_x * sum_x_2_y + sum_x_2 * sum_x_y -
> + sum_x_3 * sum_y) +
> + sum_x_2 * (sum_x_4 * sum_y - sum_x_2 * sum_x_2_y) -
> + sum_x * sum_x_4 * sum_x_y;
This is really hard to read (possibly also due to tabs)
> + discriminant = b * b - 4.0 * a * c;
> + if(discriminant > 0) {
> + /* The quadratic function has two roots, with the larger root
> + being the black point. In the original Ad*be
> + implementation, this case was incorrectly left out. */
> + const double sign_b = b > 0.0 ? 1.0 : b < 0.0 ? -1.0 : 0.0;
> + const double q = -0.5 * (b + sign_b * sqrt(discriminant));
That's just wrong for b == 0. FFSIGN(b) probably is right.
> +static int put_image(struct vf_instance_s *vf, mp_image_t *mpi,
> + double pts)
> +{
> + static const size_t clut_resolution = 1 << 8;
> + struct vf_priv_s *p = vf->priv;
> + /* Pixel format specific pointers */
> + uint8_t *y_in, *cb_in, *cr_in, *y_out, *cb_out, *cr_out;
> + uint8_t *bgr_out;
> + int i;
> +
> + if(g_cm.vo_yuv2rgb_clut != 0)
> + /* The video output plugin is performing the CLUT lookup for
> + us. */
> + return vf_next_put_image(vf, mpi, pts);
No idea why it works, but I do not think this is valid. vf_get_image
must always be called. You can avoid the overhead by implementing
direct-rendering via a get_image function, but a solution that avoids
the need to use this filter with -vo gl might be nicer in all aspects
anyway.
> +static void uninit(struct vf_instance_s *vf)
> +{
> + if(vf->priv->bpc == BPC_MA)
> + lcms_abpc_delete_transform(vf->priv->lcms_abpc_transform);
> + else if(vf->priv->lcms_transform != NULL)
> + cmsDeleteTransform(vf->priv->lcms_transform);
> + if(vf->priv->lcms_src_profile != NULL)
> + cmsCloseProfile(vf->priv->lcms_src_profile);
> + if(vf->priv->lcms_src_profile != NULL)
> + cmsCloseProfile(vf->priv->lcms_dest_profile);
> + if(vf->priv->lcms_src_profile != NULL)
> + cmsCloseProfile(vf->priv->lcms_proof_profile);
I think in MPlayer code it is more common to not use the != NULL and I
also consider it more readable, but it's your decision.
> +static const char *parse_args(struct vf_priv_s *p, const char *args)
> +{
> + char *buf, *argsdup, *pargs;
> + int i, k, len;
> +
> + len = strlen(args);
> + if((buf = (char *)malloc(len * sizeof(char))) == NULL)
> + return NULL;
> + if((argsdup = strdup(args)) == NULL) {
> + free(buf);
> + return NULL;
> + }
> +
> + /* For sscanf based parsing: find the separation between arguments
> + (regular expression: ":(?=[^=]*=)") and replace it with a
> + space. */
Maybe it did not exist when you started writing this filter, but both
the main MPlayer option parser as well as the code in subopt-helper.c
(which e.g. vo gl uses) would be much easier to use (and avoid the
strdup in addition ;-) ).
Greetings,
Reimar Döffinger
More information about the MPlayer-dev-eng
mailing list