[MPlayer-dev-eng] Help with MMX asm code

Thu Oct 23 17:04:46 CEST 2003

On Thu, 2003-10-23 at 10:29, Billy Biggs wrote:
> Please describe your memory layout better and maybe it will make more
> sense to me :)

Yes, if it's not fully obvious to you by now, I am flying by the seat of
my pants. :)

Representation of the image was taken from the original bmovl filter. 
The image is stored in YUVA where each channel is a separate array: Y
and A channels being size width*height, and U and V channels being size
width*height/4.  This is also how mplayer represents the image (except
that there is no alpha channel).

So computation is done rather straightforwardly byte for byte between
corresponding elements of the src and dst arrays.  Where mpimg is the
video frame, and img is the image stored as described above (to be
overlaid), the process is roughly this, if we assume mpimg and img are
the same dimensions:

foreach y in height:
	foreach x in width:
		pos = y * width + x
		a = layer_alpha/255 * img.A[pos]
		mpimg.Y[pos] = blend(mpimg.Y[pos], img.Y[pos], a)
		if y % 2 and x % 2:
			pos = y/2 * width/2 + x/2
			mpimg.U[pos] = blend(mpimg.U[pos], img.U[pos], a)
			mpimg.V[pos] = blend(mpimg.V[pos], img.V[pos], a)

def blend(p1, p2, a):
	# Which you pointed out is wrong ...
	return ( (255-a)*p1 + a*p2 ) >> 8

My thoughts were to use MMX to parallelize the blend computation several
bytes at once.  But maybe for now I should go back to the beginning and
rework the above approach?

Jason.

-- 
Jason Tackaberry  ::  tack at auc.ca  :: 705-949-2301 x330 
Academic Computing Support Specialist
Information Technology Services
Algoma University College  ::  www.auc.ca