[FFmpeg-devel] [PATCH] VP8: avoid conditional and division for chroma MV

Måns Rullgård mans
Wed Jun 23 20:19:02 CEST 2010


Stefan Gehrer <stefan.gehrer at gmx.de> writes:

> On 06/23/2010 07:45 PM, Ronald S. Bultje wrote:
>> Hi,
>>
>> On Wed, Jun 23, 2010 at 1:40 PM, Stefan Gehrer<stefan.gehrer at gmx.de>  wrote:
>>> On 06/23/2010 06:51 PM, Ronald S. Bultje wrote:
>>>> On Wed, Jun 23, 2010 at 12:44 PM, Stefan Gehrer<stefan.gehrer at gmx.de>
>>>>   wrote:
>>>>> Are there any recommended samples to test against?
>>>>
>>>> http://code.google.com/p/webm/downloads/detail?name=vp8-test-vectors-r1.zip&can=2&q=
>>>
>>> Okay, I tested some clips from the test vector and
>>> the frame CRCs stay the same with the patch.
>>
>> You can test all of them using a small script (use untested to get ref.md5s):
>>
>> rm -f test.md5s
>> for files in 000 001 002 003 004 005 006 007 008 009 \
>>               010 011 012 013 014 015 016 017; do \
>> 	./ffmpeg -i ~/Desktop/vp8-test-vectors-r1/vp80-00-comprehensive-${files}.ivf \
>> 		-v 0 -y -an -vcodec rawvideo -f md5 - 2>&1 | grep MD5>>  test.md5s
>> done
>> diff -u test.md5s ref.md5s&&  echo "Results identical"
>>
>>> But when I compile on my machine with gcc 4.4.3 amd -O3
>>> it seems clever enough to avoid conditional and division
>>> anyway.
>>> So now I believe in the correctness of the patch, I am
>>> just not so sure about the usefulness.
>>
>> Well, does it make it faster or lead to better assembly?
>
> A little bit of both:
>
> int uvmv_test_old (int mv)
> {
>     return (mv + (mv < 0 ? -2 : 2)) / 4;
>       90:       8b 54 24 04             mov    0x4(%esp),%edx
>       94:       89 d0                   mov    %edx,%eax
>       96:       c1 f8 1f                sar    $0x1f,%eax
>       99:       83 e0 fc                and    $0xfffffffc,%eax
>       9c:       8d 54 10 02             lea    0x2(%eax,%edx,1),%edx
>       a0:       89 d0                   mov    %edx,%eax
>       a2:       c1 f8 1f                sar    $0x1f,%eax
>       a5:       c1 e8 1e                shr    $0x1e,%eax
>       a8:       01 d0                   add    %edx,%eax
>       aa:       c1 f8 02                sar    $0x2,%eax
>       ad:       c3                      ret
> }
>
> int uvmv_test_new (int mv)
> {
>     return (mv + 2 + (mv >> (INT_BIT-1))) >> 2;
>       b0:       8b 44 24 04             mov    0x4(%esp),%eax
>       b4:       89 c2                   mov    %eax,%edx
>       b6:       c1 fa 1f                sar    $0x1f,%edx
>       b9:       8d 44 10 02             lea    0x2(%eax,%edx,1),%eax
>       bd:       c1 f8 02                sar    $0x2,%eax
>       c0:       c3                      ret
> }
>
> Putting START/STOP_TIMER just around those two lines
> in inter_predict() funtion and decoding
> vp80-00-comprehensive-002.ivf
>
> without patch
>
> 680 dezicycles in uvmv, 2048 runs, 0 skips
>
> with patch
>
> 649 dezicycles in uvmv, 2048 runs, 0 skips

Smaller _and_ faster: the choice is obvious.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list