[FFmpeg-devel] Fix MMX dct_quantize for non zigzag_direct scans

Wed May 14 17:05:28 CEST 2008

Hello,

Michael Niedermayer wrote:
> On Tue, May 13, 2008 at 04:39:39AM +0100, Ramiro Polla wrote:
>> Hello,
>>
>> If I use dct_quantize_c for the mimic encoder, all goes well. But if I
>> use MMX dct_quantize, it doesn't work the same. The last_non_zero value
>> returned is wrong. There is a note about it in
>> libavcodec/mpegvideo_enc.c and it searches again for the last_non_zero
>> coeff after all calls to dct_quantize if they're not dct_quantize_c.
>>
>> MMX dct_quantize uses inv_zigzag_direct16 independing of the scantable,
>> so that's why (I think) it returns the wrong value. If I change
>> ScanTable to also have an inverse[64] table on MMX and use that instead,
>> it returns the right value. But MMX dct_quantize uses custom code to
>> permute the block in the end, and with this new last_non_zero value
>> (from inverse[]) it doesn't copy the blocks correctly. If I add
>> non-optimized block permuting code for the cases of alternate scan, it
>> works again.
>>
>> I thought maybe the current code always returned a larger last_non_zero
>> value, but it doesn't happen all the time (although it happens most of
>> the time).
>>
>> But then I got confused... Isn't that custom permute code only good for
>> ff_zigzag_direct? Shouldn't it copy the wrong values for alternate
>> scans? Or are the values being permuted somewhere else again?
>>
>> Anyways, attached patch gives same resulting file for tempete.cif
>> encoded in mpeg4 with -flags +alt.
> 
> what happens with other optimized dct quantizers (non x86)

bfin finds last_non_zero coeff using a loop. Altivec uses an inverse 
table like the one I'm proposing to be used for MMX too. So they do 
return the appropriate value.

bfin and altivec optimized dct_quantize use ff_block_permute for the 
output block (same as  dct_quantize_c). Compn confirmed on IRC that they 
give the same output.

> and try the regression tests with forced mmx quant (by default it uses C)
> to ensure that the patch does not change the mmx output.

Regression tests are consistent when MMX quantization is used (that is: 
no bitexact, and auto idct and dct).

> and id like to see benchmarks as well, so we can be sure this doesnt
> slow the code down

Adding custom permutation code for ff_alternate_vertical_scan the same 
way it's done for ff_zigzag_direct gives these results (the source file 
is properly cached in memory, 10 runs removing the highest and lowest):

./ffmpeg_g -benchmark -s 352x288 -i paris.cif -flags +alt -vcodec mpeg4 
-y -f rawvideo /dev/null

       ref     new
avg   2.7905  2.7390
stdev 0.0318  0.0287

The code also adds a fallback routine if it's not zigzag_direct or the 
usual alternate_scan.

Ramiro Polla
-------------- next part --------------
A non-text attachment was scrubbed...
Name: i3863.diff
Type: text/x-patch
Size: 8517 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080514/773bb1ae/attachment.bin>