[FFmpeg-devel] [PATCH] VP8: avoid conditional and division for chroma MV
Måns Rullgård
mans
Wed Jun 23 20:19:02 CEST 2010
Stefan Gehrer <stefan.gehrer at gmx.de> writes:
> On 06/23/2010 07:45 PM, Ronald S. Bultje wrote:
>> Hi,
>>
>> On Wed, Jun 23, 2010 at 1:40 PM, Stefan Gehrer<stefan.gehrer at gmx.de> wrote:
>>> On 06/23/2010 06:51 PM, Ronald S. Bultje wrote:
>>>> On Wed, Jun 23, 2010 at 12:44 PM, Stefan Gehrer<stefan.gehrer at gmx.de>
>>>> wrote:
>>>>> Are there any recommended samples to test against?
>>>>
>>>> http://code.google.com/p/webm/downloads/detail?name=vp8-test-vectors-r1.zip&can=2&q=
>>>
>>> Okay, I tested some clips from the test vector and
>>> the frame CRCs stay the same with the patch.
>>
>> You can test all of them using a small script (use untested to get ref.md5s):
>>
>> rm -f test.md5s
>> for files in 000 001 002 003 004 005 006 007 008 009 \
>> 010 011 012 013 014 015 016 017; do \
>> ./ffmpeg -i ~/Desktop/vp8-test-vectors-r1/vp80-00-comprehensive-${files}.ivf \
>> -v 0 -y -an -vcodec rawvideo -f md5 - 2>&1 | grep MD5>> test.md5s
>> done
>> diff -u test.md5s ref.md5s&& echo "Results identical"
>>
>>> But when I compile on my machine with gcc 4.4.3 amd -O3
>>> it seems clever enough to avoid conditional and division
>>> anyway.
>>> So now I believe in the correctness of the patch, I am
>>> just not so sure about the usefulness.
>>
>> Well, does it make it faster or lead to better assembly?
>
> A little bit of both:
>
> int uvmv_test_old (int mv)
> {
> return (mv + (mv < 0 ? -2 : 2)) / 4;
> 90: 8b 54 24 04 mov 0x4(%esp),%edx
> 94: 89 d0 mov %edx,%eax
> 96: c1 f8 1f sar $0x1f,%eax
> 99: 83 e0 fc and $0xfffffffc,%eax
> 9c: 8d 54 10 02 lea 0x2(%eax,%edx,1),%edx
> a0: 89 d0 mov %edx,%eax
> a2: c1 f8 1f sar $0x1f,%eax
> a5: c1 e8 1e shr $0x1e,%eax
> a8: 01 d0 add %edx,%eax
> aa: c1 f8 02 sar $0x2,%eax
> ad: c3 ret
> }
>
> int uvmv_test_new (int mv)
> {
> return (mv + 2 + (mv >> (INT_BIT-1))) >> 2;
> b0: 8b 44 24 04 mov 0x4(%esp),%eax
> b4: 89 c2 mov %eax,%edx
> b6: c1 fa 1f sar $0x1f,%edx
> b9: 8d 44 10 02 lea 0x2(%eax,%edx,1),%eax
> bd: c1 f8 02 sar $0x2,%eax
> c0: c3 ret
> }
>
> Putting START/STOP_TIMER just around those two lines
> in inter_predict() funtion and decoding
> vp80-00-comprehensive-002.ivf
>
> without patch
>
> 680 dezicycles in uvmv, 2048 runs, 0 skips
>
> with patch
>
> 649 dezicycles in uvmv, 2048 runs, 0 skips
Smaller _and_ faster: the choice is obvious.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list