[Ffmpeg-devel] Re: about mmx instructions
Michael Niedermayer
michaelni
Thu Sep 1 16:07:29 CEST 2005
Hi
On Wed, Aug 31, 2005 at 07:26:17PM +0200, thomas.kunlin at free.fr wrote:
> Hello,
>
> I am a Phd student working on H.264, i have a question concerning
> a calculation found in the loop filter implementation of ffmpeg.
> In the H264_DEBLOCK_P0_Q0 macro (h264dsp_mmx.c) :
> delta = (q0-p0+((p1-q1)>>2)+1)>>1
> is obtained by the following calculation :
> delta = e-f , or: -delta=f-e
> where :
> f = ((p0+(q1>>2)+1)>>1) + (d&~a)
> e = ((q0+(p1>>2)+1)>>1) + (d&a)
> d = (c^b)&~(b^a)^1
typo, it should be d = (c^b)&~(b^a)&1
> c = q0^(p1>>2)
> b = p0^(q1>>2)
> a = p0^q0^((p1-q1)>>2)
>
> I have had a bad time trying to understand how this does the trick.
> Could you give me some explanations/pointers about the creation of such a
> magical formula :-) ?
pointers, hmm ffmpegs source & http://www.aggregate.org/MAGIC/
explanation, ok thats easier :)
the reason why it cant be calculated with the trivial
(q0-p0+((p1-q1)>>2)+1)>>1
is that the intermediates and the result would not fit within 8bit, and
doing it in 16bit would be half the speed + converting 8<->16bit
so we first need to decide how to represent the result in 8bit
e-f with both e and f unsigned 8bit integers seems like a obvious choice
at least when reading the code maybe not before writing it though :)
as the inputs are also unsigned 8bit e and f should be
f = ((p0+(q1>>2)+1)>>1)
e = ((q0+(p1>>2)+1)>>1)
ignoring rounding of the >> operations ...
so the only thing left is to fix the least significant bits
a = p0^q0^((p1-q1)>>2)
gives the correct least significant bit before the +1)>>1
c = q0^(p1>>2)
b = p0^(q1>>2)
produces the incorrect least significant bit before the +1)>>1 which is
used in the calculation of e and f
now, how do the rounding differences look/behave
(p1>>2)-(q1>>2) - 1 == ((p1-q1)>>2) iff (p1&3) < (q1&3) otherwise they are
equal ((p1>>2)-(q1>>2) == (p1-q1)>>2)
so we could fix this part by changing f to
f = ((p0+(q1>>2)+1-X)>>1) where X is 1 iff (p1&3) < (q1&3)
can we detect this case from the LSB bit from a,b,c?
yes, X= (a^c^b)&1
and ((A+1)>>1) - ((B+1)>>1) + 1 == (A-B+1)>>1 iff B&1=1 and A&1=0
otherwise they are equal
and ((A+1)>>1) - ((B )>>1) - 1 == (A-B+1)>>1 iff B&1=1 and A&1=1
otherwise they are equal
and b&1 == (B-1)&1 -> we must subtract 1 iff b&1=0 and c&1=1
so for the X=0 case we need to correct by (b&~c)&1
and for the X=1 case we need to correct by -((~b&c)&1)
-> X = ~a if we limit ourselfs to the case where correction is needed
f = ((p0+(q1>>2)+1)>>1) + (d&~a)
e = ((q0+(p1>>2)+1)>>1) + (d&a)
d = (c^b)&~(b^a)^1
should be obvious based upon the above
btw, iam CCing this to ffmpeg-dev as it might be interresting for others
too
anyone got a nicer derivation/proof?
or even a faster implementation?
[...]
--
Michael
More information about the ffmpeg-devel
mailing list