[FFmpeg-devel] Fix MMX dct_quantize for non zigzag_direct scans
Thu May 15 00:25:43 CEST 2008
>>> and id like to see benchmarks as well, so we can be sure this doesnt
>>> slow the code down
>> Adding custom permutation code for ff_alternate_vertical_scan the same way
>> it's done for ff_zigzag_direct gives these results (the source file is
>> properly cached in memory, 10 runs removing the highest and lowest):
>> ./ffmpeg_g -benchmark -s 352x288 -i paris.cif -flags +alt -vcodec mpeg4 -y
>> -f rawvideo /dev/null
>> ref new
>> avg 2.7905 2.7390
>> stdev 0.0318 0.0287
> I do not care at all about alternate_scan speed. I care about zigzag_direct
> speed! Its whats used 99% of the time.
> even a 0.1% speedloss for zigzag means rejected patch!
Tested with a bigger cif sample.
avg 7.9774 7.9652
stdev 0.0433 0.0396
I ran the tests on init 1 with everything that I could stop (no network,
no log daemon, no fs journaling, no cron, no bunch of daemons...), and
couldn't get the stdev to be below the 0.1% you want.
Benchmarking with START/STOP_TIMER isn't very good since the runs can
vary on the time they take depending on last_non_zero. Also the patch
changes not only the MMX code but removes the hack in mpegvideo_enc.c.
The changes for direct zigzag are basically:
- read inverse from context instead of from static array. (shouldn't
slow anything down)
- remove an if(s->alternate_scan) from mpegvideo_enc.c
- add an if(s->intra_scantable.scantable == ff_zigzag_direct) to MMX code.
I think this is a good solution that shows no measurable speed loss. I
can test more if someone knows a way to make incredibly accurate
benchmarking (<0.1% stdev).
We could also have two functions of each, one hardcoded to direct zigzag
and another more generic, that can be set in MPV_common_init_mmx()
(under #ifdef CONFIG_SMALL). I attached an example of what could be
done. Another way would be if my first patch is accepted, have an
av_always_inline dct_quantize_xxx_template(..., direct_zigzag), and
create dct_quantize_xxx_direct and dct_quantize_xxx_normal.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 6089 bytes
Desc: not available
More information about the ffmpeg-devel