[FFmpeg-devel] A further potential optimization of H264 deblocking
Guy Bonneau
gbonneau
Tue May 29 19:58:08 CEST 2007
While I was studying the deblocking algorithm of the H264 specification I
dug in the ffmpeg mmx implementation to understand how this is implemented.
I had a hard time to understand the optimization so I went through the
Binary Math. Yet while doing the Math I think I might have found a small
further optimization.
Let start from:
(((q0-p0)<<2) + (p1-q1) + 4) >> 3????? (1)
The first 2 Least Significant Bit of result (p1-q1) doesn?t add to the
result. Thus they can be dropped.
And we can rewrite the equation to:
(((q0-p0)) + ((p1-q1) >> 2) + 1) >> 1??? (2)
We have the identity
(a-b) = a+(~b)+1 ? 256???(Note a and b are unsigned value)
Thus we can rewrite (2) :
(((q0+~p0 + 1 - 256)) + ((p1+~q1 + 1 - 256) >> 2) + 1) >> 1
And trying to use PAVGB we can do some binary mathematic:
???
(((q0+~p0 + 1 - 256)) + ((p1+~q1 + 1) >> 2) - 64 + 1)? >> 1
(((q0+~p0 + 1)) + (PAVGB(p1,~q1) >> 1) - 256 - 64 + 1)? >> 1
(((q0+~p0 + 1)) + (PAVGB(p1,~q1) + 4 ) >> 1) - 256 - 64 ? (4>>1) + 1)?>> 1
(((q0+~p0 + 1)) + (PAVGB(p1,~q1) + 3?+ 1) >> 1) - 256 - 64 ? (4>>1) +1)?>> 1
(((q0+~p0 + 1)) + (PAVGB(p1,~q1) + 3?+ 1) >> 1) - 256 - 64 ? 2 + 1)?>> 1
(((q0+~p0 + 1)) + PAVGB(PAVGB(p1,~q1), 3)?? - 256 - 64 ? 2 + 1)? >> 1
(((q0+~p0 + 1)) + PAVGB(PAVGB(p1,~q1), 3) + 1)? >> 1?- 128 ? 33
Let replace? PAVGB(PAVGB(p1,~q1), 3) as val1
Let replace?? q0+~p0 + 1 as val2
Then we have
val1 + val2 + 1 >> 1
and
PAVGB((q0+~p0+1), PAVGB(PAVGB(p1,~q1), 3))?>> 1? - 161
I think it is possible to rewrite the code to:
#define H264_DEBLOCK_P0_Q0(pb_01, pb_3f)\
"pcmpeqb %%mm4 , %%mm4 \n\t"\
"pxor %%mm4 , %%mm3 \n\t"\
"pavgb %%mm0 , %%mm3 \n\t" /* (p1 - q1 + 256)>>1*/\
"pavgb "MANGLE(ff_pb_3)" , %%mm3 \n\t" /*(((p1 - q1 + 256)>>1)+4)>>1 =
64+2+(p1-q1)>>2*/\
"pxor %%mm1 , %%mm4 \n\t" /*~p0*/\
"paddb "#pb_01" , %%mm4.\n\t" /*~p0+1*/\
"paddb %%mm2 , %%mm4 \n\t" /* (q0 + ~p0 + 1)*/\
"pavgb %%mm4 , %%mm3 \n\t" /* d+128+33*/\
"movq "MANGLE(ff_pb_A1)" , %%mm6 \n\t"\
"psubusb %%mm3 , %%mm6 \n\t"\
"psubusb "MANGLE(ff_pb_A1)" , %%mm3 \n\t"\
"pminub %%mm7 , %%mm6 \n\t"\
"pminub %%mm7 , %%mm3 \n\t"\
"psubusb %%mm6 , %%mm1 \n\t"\
"psubusb %%mm3 , %%mm2 \n\t"\
"paddusb %%mm3 , %%mm1 \n\t"\
"paddusb %%mm6 , %%mm2 \n\t"\
Its is a 3 instructions gain over 20 instructions.
Unfortunately I cannot try it myself since I'm using Windows
and this is theoretical stuff. If someone can do it I would
be interested to know if it works or I missed something.
Thanks
Guy Bonneau
More information about the ffmpeg-devel
mailing list