[FFmpeg-devel] Fix MMX dct_quantize for non zigzag_direct scans
Wed May 14 17:23:09 CEST 2008
On Wed, May 14, 2008 at 04:05:28PM +0100, Ramiro Polla wrote:
> Michael Niedermayer wrote:
>> On Tue, May 13, 2008 at 04:39:39AM +0100, Ramiro Polla wrote:
>>> If I use dct_quantize_c for the mimic encoder, all goes well. But if I
>>> use MMX dct_quantize, it doesn't work the same. The last_non_zero value
>>> returned is wrong. There is a note about it in
>>> libavcodec/mpegvideo_enc.c and it searches again for the last_non_zero
>>> coeff after all calls to dct_quantize if they're not dct_quantize_c.
>>> MMX dct_quantize uses inv_zigzag_direct16 independing of the scantable,
>>> so that's why (I think) it returns the wrong value. If I change
>>> ScanTable to also have an inverse table on MMX and use that instead,
>>> it returns the right value. But MMX dct_quantize uses custom code to
>>> permute the block in the end, and with this new last_non_zero value
>>> (from inverse) it doesn't copy the blocks correctly. If I add
>>> non-optimized block permuting code for the cases of alternate scan, it
>>> works again.
>>> I thought maybe the current code always returned a larger last_non_zero
>>> value, but it doesn't happen all the time (although it happens most of
>>> the time).
>>> But then I got confused... Isn't that custom permute code only good for
>>> ff_zigzag_direct? Shouldn't it copy the wrong values for alternate
>>> scans? Or are the values being permuted somewhere else again?
>>> Anyways, attached patch gives same resulting file for tempete.cif
>>> encoded in mpeg4 with -flags +alt.
>> what happens with other optimized dct quantizers (non x86)
> bfin finds last_non_zero coeff using a loop. Altivec uses an inverse table
> like the one I'm proposing to be used for MMX too. So they do return the
> appropriate value.
> bfin and altivec optimized dct_quantize use ff_block_permute for the output
> block (same as dct_quantize_c). Compn confirmed on IRC that they give the
> same output.
>> and try the regression tests with forced mmx quant (by default it uses C)
>> to ensure that the patch does not change the mmx output.
> Regression tests are consistent when MMX quantization is used (that is: no
> bitexact, and auto idct and dct).
>> and id like to see benchmarks as well, so we can be sure this doesnt
>> slow the code down
> Adding custom permutation code for ff_alternate_vertical_scan the same way
> it's done for ff_zigzag_direct gives these results (the source file is
> properly cached in memory, 10 runs removing the highest and lowest):
> ./ffmpeg_g -benchmark -s 352x288 -i paris.cif -flags +alt -vcodec mpeg4 -y
> -f rawvideo /dev/null
> ref new
> avg 2.7905 2.7390
> stdev 0.0318 0.0287
I do not care at all about alternate_scan speed. I care about zigzag_direct
speed! Its whats used 99% of the time.
even a 0.1% speedloss for zigzag means rejected patch!
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Old school: Use the lowest level language in which you can solve the problem
New school: Use the highest level language in which the latest supercomputer
can solve the problem without the user falling asleep waiting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel