[Ffmpeg-devel] [PATCH] H.264 deblocking mmx
Thu Apr 28 22:08:01 CEST 2005
On Mon, 2005-04-25 at 00:39, Loren Merritt wrote:
> I noticed that the inloop deblocking filter was taking a large fraction of
> the decode time, and it is inherently parallel, so...
just some remarks about the patch:
a) chroma deblocking filters 4 pixels at a time, whereas
it seems to me only 2 chroma pixels share the same
strength (deduced from co-located the 4x4 luma block contents).
And even, for MBAff, you sometimes have to filter only 1
vertical chroma sample (in case of Field->Frame or Frame->Field
vertical filtering) at a time.
b) the ASM code is computing the ABS(a-b) value, and afterward
compares it to Alpha/Beta. It uses 16bits words.
But in fact, only the result of the test (not the abs value itself)
matters. And could be advantageously be computed using
unsigned 8bits values only, since it would both avoid a 8b->16b
conversion, and allow testing the lower and upper bound
of ABS(a-b) in one shot.
Here's an example for the test ABS(P0-Q0)<Alpha, using 8b only:
input: mm7 = Alpha value, 8bits, replicated 8 times
movd mm0, [Q0] ; four pixels 'Q0' in lower 32bits
punpckldq mm0, [P0] ; four pixels 'P0' in higher 32bits
pshufw mm1, mm0, 01001110b ; P0 | Q0 (Swap P0 and Q0)
paddusb mm0, mm7 ; Q0+Alpha | P0+Alpha
psubusb mm0, mm1 ; Q0+Alpha-P0 | P0+Alpha-Q0
At this point: mm0 contains zeros in the lower 32bits if P0>=Q0+Alpha,
and zeros in the higher 32bits if Q0>=P0+Alpha.
Note: you can repeat/pair the above 3 instructions for the other tests
(ABS(P1-P0)<Beta, etc...), and accumulate the results in mm0 using
a 'por' instruction...
In the end, when one wants the final result:
pminub mm0, [One] ; mask is now made of '0' or '1'. [One] is 1, replicated 8 times
pshufw mm1, mm0, 01001110b ; Swap the hi/lo 32 bits
pxor mm0, [One] ; flip the bits
pand mm1, mm0 ; => the higher 4 bytes of mm1 tell whether the pixels should be filtered or not.
Hope it helps.
Before you ask: why don't i supply a patch for that? Simply because i'm very dislike inlined ASM code.
I can hardly read it, let alone write some. But fortunately, Michael is around here ;)
More information about the ffmpeg-devel