[FFmpeg-devel] [PATCH] VP8: avoid conditional and division for chroma MV
Ronald S. Bultje
rsbultje
Wed Jun 23 20:51:30 CEST 2010
Hi,
2010/6/23 M?ns Rullg?rd <mans at mansr.com>:
> Stefan Gehrer <stefan.gehrer at gmx.de> writes:
>> On 06/23/2010 07:45 PM, Ronald S. Bultje wrote:
>>> On Wed, Jun 23, 2010 at 1:40 PM, Stefan Gehrer<stefan.gehrer at gmx.de> ?wrote:
>>>> On 06/23/2010 06:51 PM, Ronald S. Bultje wrote:
>>>>> On Wed, Jun 23, 2010 at 12:44 PM, Stefan Gehrer<stefan.gehrer at gmx.de>
>>>>> ? wrote:
>>>>>> Are there any recommended samples to test against?
>>>>>
>>>>> http://code.google.com/p/webm/downloads/detail?name=vp8-test-vectors-r1.zip&can=2&q=
>>>>
>>>> Okay, I tested some clips from the test vector and
>>>> the frame CRCs stay the same with the patch.
>>>
>>> You can test all of them using a small script (use untested to get ref.md5s):
>>>
>>> rm -f test.md5s
>>> for files in 000 001 002 003 004 005 006 007 008 009 \
>>> ? ? ? ? ? ? ? 010 011 012 013 014 015 016 017; do \
>>> ? ? ?./ffmpeg -i ~/Desktop/vp8-test-vectors-r1/vp80-00-comprehensive-${files}.ivf \
>>> ? ? ? ? ? ? ?-v 0 -y -an -vcodec rawvideo -f md5 - 2>&1 | grep MD5>> ?test.md5s
>>> done
>>> diff -u test.md5s ref.md5s&& ?echo "Results identical"
>>>
>>>> But when I compile on my machine with gcc 4.4.3 amd -O3
>>>> it seems clever enough to avoid conditional and division
>>>> anyway.
>>>> So now I believe in the correctness of the patch, I am
>>>> just not so sure about the usefulness.
>>>
>>> Well, does it make it faster or lead to better assembly?
>>
>> A little bit of both:
>>
>> int uvmv_test_old (int mv)
>> {
>> ? ? return (mv + (mv < 0 ? -2 : 2)) / 4;
>> ? ? ? 90: ? ? ? 8b 54 24 04 ? ? ? ? ? ? mov ? ?0x4(%esp),%edx
>> ? ? ? 94: ? ? ? 89 d0 ? ? ? ? ? ? ? ? ? mov ? ?%edx,%eax
>> ? ? ? 96: ? ? ? c1 f8 1f ? ? ? ? ? ? ? ?sar ? ?$0x1f,%eax
>> ? ? ? 99: ? ? ? 83 e0 fc ? ? ? ? ? ? ? ?and ? ?$0xfffffffc,%eax
>> ? ? ? 9c: ? ? ? 8d 54 10 02 ? ? ? ? ? ? lea ? ?0x2(%eax,%edx,1),%edx
>> ? ? ? a0: ? ? ? 89 d0 ? ? ? ? ? ? ? ? ? mov ? ?%edx,%eax
>> ? ? ? a2: ? ? ? c1 f8 1f ? ? ? ? ? ? ? ?sar ? ?$0x1f,%eax
>> ? ? ? a5: ? ? ? c1 e8 1e ? ? ? ? ? ? ? ?shr ? ?$0x1e,%eax
>> ? ? ? a8: ? ? ? 01 d0 ? ? ? ? ? ? ? ? ? add ? ?%edx,%eax
>> ? ? ? aa: ? ? ? c1 f8 02 ? ? ? ? ? ? ? ?sar ? ?$0x2,%eax
>> ? ? ? ad: ? ? ? c3 ? ? ? ? ? ? ? ? ? ? ?ret
>> }
>>
>> int uvmv_test_new (int mv)
>> {
>> ? ? return (mv + 2 + (mv >> (INT_BIT-1))) >> 2;
>> ? ? ? b0: ? ? ? 8b 44 24 04 ? ? ? ? ? ? mov ? ?0x4(%esp),%eax
>> ? ? ? b4: ? ? ? 89 c2 ? ? ? ? ? ? ? ? ? mov ? ?%eax,%edx
>> ? ? ? b6: ? ? ? c1 fa 1f ? ? ? ? ? ? ? ?sar ? ?$0x1f,%edx
>> ? ? ? b9: ? ? ? 8d 44 10 02 ? ? ? ? ? ? lea ? ?0x2(%eax,%edx,1),%eax
>> ? ? ? bd: ? ? ? c1 f8 02 ? ? ? ? ? ? ? ?sar ? ?$0x2,%eax
>> ? ? ? c0: ? ? ? c3 ? ? ? ? ? ? ? ? ? ? ?ret
>> }
>>
>> Putting START/STOP_TIMER just around those two lines
>> in inter_predict() funtion and decoding
>> vp80-00-comprehensive-002.ivf
>>
>> without patch
>>
>> 680 dezicycles in uvmv, 2048 runs, 0 skips
>>
>> with patch
>>
>> 649 dezicycles in uvmv, 2048 runs, 0 skips
>
> Smaller _and_ faster: the choice is obvious.
Indeed, patch OK. You have SVN access right?
Ronald
More information about the ffmpeg-devel
mailing list