[FFmpeg-devel] Fix MMX dct_quantize for non zigzag_direct scans
Wed May 14 17:05:28 CEST 2008
Michael Niedermayer wrote:
> On Tue, May 13, 2008 at 04:39:39AM +0100, Ramiro Polla wrote:
>> If I use dct_quantize_c for the mimic encoder, all goes well. But if I
>> use MMX dct_quantize, it doesn't work the same. The last_non_zero value
>> returned is wrong. There is a note about it in
>> libavcodec/mpegvideo_enc.c and it searches again for the last_non_zero
>> coeff after all calls to dct_quantize if they're not dct_quantize_c.
>> MMX dct_quantize uses inv_zigzag_direct16 independing of the scantable,
>> so that's why (I think) it returns the wrong value. If I change
>> ScanTable to also have an inverse table on MMX and use that instead,
>> it returns the right value. But MMX dct_quantize uses custom code to
>> permute the block in the end, and with this new last_non_zero value
>> (from inverse) it doesn't copy the blocks correctly. If I add
>> non-optimized block permuting code for the cases of alternate scan, it
>> works again.
>> I thought maybe the current code always returned a larger last_non_zero
>> value, but it doesn't happen all the time (although it happens most of
>> the time).
>> But then I got confused... Isn't that custom permute code only good for
>> ff_zigzag_direct? Shouldn't it copy the wrong values for alternate
>> scans? Or are the values being permuted somewhere else again?
>> Anyways, attached patch gives same resulting file for tempete.cif
>> encoded in mpeg4 with -flags +alt.
> what happens with other optimized dct quantizers (non x86)
bfin finds last_non_zero coeff using a loop. Altivec uses an inverse
table like the one I'm proposing to be used for MMX too. So they do
return the appropriate value.
bfin and altivec optimized dct_quantize use ff_block_permute for the
output block (same as dct_quantize_c). Compn confirmed on IRC that they
give the same output.
> and try the regression tests with forced mmx quant (by default it uses C)
> to ensure that the patch does not change the mmx output.
Regression tests are consistent when MMX quantization is used (that is:
no bitexact, and auto idct and dct).
> and id like to see benchmarks as well, so we can be sure this doesnt
> slow the code down
Adding custom permutation code for ff_alternate_vertical_scan the same
way it's done for ff_zigzag_direct gives these results (the source file
is properly cached in memory, 10 runs removing the highest and lowest):
./ffmpeg_g -benchmark -s 352x288 -i paris.cif -flags +alt -vcodec mpeg4
-y -f rawvideo /dev/null
avg 2.7905 2.7390
stdev 0.0318 0.0287
The code also adds a fallback routine if it's not zigzag_direct or the
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 8517 bytes
Desc: not available
More information about the ffmpeg-devel