[MPlayer-dev-eng] Help with MMX asm code

Fri Oct 24 01:01:41 CEST 2003

Jason Tackaberry (tack at auc.ca):

> > I would recommend instead: a = layer_alpha/256 * img.A[pos] as
> > division by 255 is expensive and it's cheap to keep around 4 bytes
> > instead of only 1 byte for your layer alpha.
> 
> In my implementation I pulled layer_alpha/255 out of the inner loops
> so although it's expensive it's barely noticed.

  You replaced it with having layer_alpha be represented as a float,
incurring an int -> float, float multiply, float -> int conversion
instead, which is much more expensive than the alternative:

   ((layer_alpha * img.A[pos]) + 128) / 256

> >   First, this seems wrong.  If we look at a block of four pixels:
> > 
> >   A B
> >   C D
> > 
> > You're using the alpha from pixel D to apply to the Cb/Cr
> > components.  For MPEG2, the chroma samples are positioned halfway
> > between A and C, so if you want to be really correct, you should
> > filter the alpha channel, for example by taking the average alpha
> > value between A and C.  If this is expensive, at least use the alpha
> > of pixel A and not pixel D.
> 
> Would averaging the alpha between B and D also be acceptable?

  No.

> Also, why would taking the alpha of pixel A for Cb/Cr be any better
> than using D, or for that matter B or C?  At least, it's not clear to
> me why it's any less correct (or more incorrect, in this case), and
> certainly my eye can't tell the difference.

  Because the Cb/Cr values in the images in an MPEG2 stream are the
average (or better filtering) of the Cb/Cr values of pixels A and C, not
pixels D and B.  Those Cb/Cr values are not for 'all four pixels'.  If
you want to know, what is the chroma of pixel D, or the chroma at pixel
B, you have to interpolate to get it.

  The human eye is not very sensitive to colour, so using pixel D you
might not notice it right away.  You will most likely notice it on the
edges of blue and red objects in particular, or something colourful next
to something grey.

> The code that needs optimizing is the nested loop in put_image(),
> which is essentially the C code for the algorithm I described earlier.

  I'll take a look sometime if I ahve a chance.

  -Billy