[FFmpeg-devel] [PATCH] VP8: avoid conditional and division for chroma MV
Stefan Gehrer
stefan.gehrer
Wed Jun 23 20:14:00 CEST 2010
On 06/23/2010 07:45 PM, Ronald S. Bultje wrote:
> Hi,
>
> On Wed, Jun 23, 2010 at 1:40 PM, Stefan Gehrer<stefan.gehrer at gmx.de> wrote:
>> On 06/23/2010 06:51 PM, Ronald S. Bultje wrote:
>>> On Wed, Jun 23, 2010 at 12:44 PM, Stefan Gehrer<stefan.gehrer at gmx.de>
>>> wrote:
>>>> Are there any recommended samples to test against?
>>>
>>> http://code.google.com/p/webm/downloads/detail?name=vp8-test-vectors-r1.zip&can=2&q=
>>
>> Okay, I tested some clips from the test vector and
>> the frame CRCs stay the same with the patch.
>
> You can test all of them using a small script (use untested to get ref.md5s):
>
> rm -f test.md5s
> for files in 000 001 002 003 004 005 006 007 008 009 \
> 010 011 012 013 014 015 016 017; do \
> ./ffmpeg -i ~/Desktop/vp8-test-vectors-r1/vp80-00-comprehensive-${files}.ivf \
> -v 0 -y -an -vcodec rawvideo -f md5 - 2>&1 | grep MD5>> test.md5s
> done
> diff -u test.md5s ref.md5s&& echo "Results identical"
>
>> But when I compile on my machine with gcc 4.4.3 amd -O3
>> it seems clever enough to avoid conditional and division
>> anyway.
>> So now I believe in the correctness of the patch, I am
>> just not so sure about the usefulness.
>
> Well, does it make it faster or lead to better assembly?
A little bit of both:
int uvmv_test_old (int mv)
{
return (mv + (mv < 0 ? -2 : 2)) / 4;
90: 8b 54 24 04 mov 0x4(%esp),%edx
94: 89 d0 mov %edx,%eax
96: c1 f8 1f sar $0x1f,%eax
99: 83 e0 fc and $0xfffffffc,%eax
9c: 8d 54 10 02 lea 0x2(%eax,%edx,1),%edx
a0: 89 d0 mov %edx,%eax
a2: c1 f8 1f sar $0x1f,%eax
a5: c1 e8 1e shr $0x1e,%eax
a8: 01 d0 add %edx,%eax
aa: c1 f8 02 sar $0x2,%eax
ad: c3 ret
}
int uvmv_test_new (int mv)
{
return (mv + 2 + (mv >> (INT_BIT-1))) >> 2;
b0: 8b 44 24 04 mov 0x4(%esp),%eax
b4: 89 c2 mov %eax,%edx
b6: c1 fa 1f sar $0x1f,%edx
b9: 8d 44 10 02 lea 0x2(%eax,%edx,1),%eax
bd: c1 f8 02 sar $0x2,%eax
c0: c3 ret
}
Putting START/STOP_TIMER just around those two lines
in inter_predict() funtion and decoding
vp80-00-comprehensive-002.ivf
without patch
450 dezicycles in uvmv, 1 runs, 0 skips
450 dezicycles in uvmv, 2 runs, 0 skips
450 dezicycles in uvmv, 4 runs, 0 skips
450 dezicycles in uvmv, 8 runs, 0 skips
450 dezicycles in uvmv, 16 runs, 0 skips
455 dezicycles in uvmv, 32 runs, 0 skips
459 dezicycles in uvmv, 64 runs, 0 skips
539 dezicycles in uvmv, 128 runs, 0 skips
620 dezicycles in uvmv, 256 runs, 0 skips
652 dezicycles in uvmv, 512 runs, 0 skips
673 dezicycles in uvmv, 1024 runs, 0 skips
680 dezicycles in uvmv, 2048 runs, 0 skips
with patch
360 dezicycles in uvmv, 1 runs, 0 skips
405 dezicycles in uvmv, 2 runs, 0 skips
427 dezicycles in uvmv, 4 runs, 0 skips
427 dezicycles in uvmv, 8 runs, 0 skips
472 dezicycles in uvmv, 16 runs, 0 skips
551 dezicycles in uvmv, 32 runs, 0 skips
507 dezicycles in uvmv, 64 runs, 0 skips
516 dezicycles in uvmv, 128 runs, 0 skips
592 dezicycles in uvmv, 256 runs, 0 skips
631 dezicycles in uvmv, 512 runs, 0 skips
646 dezicycles in uvmv, 1024 runs, 0 skips
649 dezicycles in uvmv, 2048 runs, 0 skips
Stefan
More information about the ffmpeg-devel
mailing list