[Ffmpeg-devel] Re: about mmx instructions

Sat Sep 3 09:09:43 CEST 2005

	Howdy everybody!

	(hey don't you guys never take holidays?)
	(or: do you spare funny questions *for* holidays? ;)

On Thu, 2005-09-01 at 16:07, Michael Niedermayer wrote:

> anyone got a nicer derivation/proof?

	Well, for a more generic approach, one should remember
	that it's all about emulating the propagation of 
	carry/borrow.

	Let's take for example the addition of two 8-bits values,
	followed by a shift by 2:  (x+y)>>2

	The intermediate value (x+y) doesn't fit in 8bits (because
	of the carry), but one can split the addition in two parts:
	the 2 lowest bits are added, and the resulting carry is reported
	to the result of adding the highest 6 bits. Namely:

	(x+y)>>2 = [ (x>>2) + (y>>2) ] + [((x&3) + (y&3)) >> 2)]

	The first term is the carry-less addition of the 6 hi bits, the
	second terms adds the 2 lowest bits and only keeps the carry.

	Now, note that you can also retrieve the carry from the
	result itself, noting that:
	  (x+y)>>2 = (x>>2) + (y>>2) + { ((x+y)>>2) ^ (x>>2) ^ (y>>2) }
             (x-y)>>2 = (x>>2) - (y>>2) - { ((x-y)>>2) ^ (x>>2) ^ (y>>2) }
	These relations may sound silly, since we're using the result
	(x+y)>>2 ... to find the result (x+y)>>2. But recall that the
	problem is to make all the intermediate values fit into 8 bits.

	Ok, same goes on for difference, and one can also replace the
	arithmetic ops by logical ones, if one recalls that, in 
	boolean algebra, the '&' logical operator is equivalent to the
	arithmetic multiplication, and the '^' operator equivalent to
	the addition:

	(x+y)>>1 = (x>>1) + (y>>1) + { x&y }
	(x+y+1)>>1 = (x>>1) + (y>>1) + { x|y }
          (x-y)>>1 = (x>>1) - (y>>1) - { y&~x }
	(x-y+1)>>1 = (x>>1) - (y>>1) + { x&~y }

	where, for short, {a} means 'lsb of a', that is: {a} = a&1.
	Note that: {a+b} = {a}^{b}, {a*b} = {a}^{b}, {a+a}={0}, etc..

	Ok, now we can stir the bit soup, using : x=(p0+(q1>>2)+1)
	and y=(q0+(p1>>2)+1.

	First we have:

	(p1-q1)>>2 = (p1>>2) - (q1>>2) - Epsilon

	Where Epsilon={0,1}  is { ((p1-q1)>>2) ^ (p1>>2) ^ (q1>>2) }

	So that:

	(q0-p0+(p1-q1)>>2 + 1) = [q0+(p1>>2)+1] - [p0+(q1>>2)+1]  + 1-Epsilon

	Now, you can use the generic relation:

	(x-y + e)>>1 = [ (x>>1) + { x&~y&e } ] - [ (y>>1) + { y&~x&~e } ]

	for e = 0 or 1.

	Replacing e by 1-Epsilon = 1^ { ((p1-q1)>>2) ^ (p1>>2) ^ (q1>>2) }
	will lead to the result after some boolean stirring, if you recall,
	for instance, that:  { p0+(q1>>2) } = {p0}^{(q1>>2)} as far as the 
	LSB is concerned.

> or even a faster implementation?

	Hint: (d&a) and (d&~a) can be simplified to not use 'a'
	twice. Left as exercise to the reader...

	Hope it helps,
	bye!
Skal