[FFmpeg-devel] VP8 decoder optimization status

Jason Garrett-Glaser darkshikari
Wed Jun 30 22:54:36 CEST 2010


2010/6/30 M?ns Rullg?rd <mans at mansr.com>:
> Jason Garrett-Glaser <darkshikari at gmail.com> writes:
>
>> On Wed, Jun 30, 2010 at 1:19 PM, Stefan Gehrer <stefan.gehrer at gmx.de> wrote:
>>> On 06/30/2010 10:15 PM, Stefan Gehrer wrote:
>>>>
>>>> On 06/30/2010 08:54 PM, Jason Garrett-Glaser wrote:
>>>>>
>>>>> On Wed, Jun 30, 2010 at 8:55 AM, Stefan Gehrer<stefan.gehrer at gmx.de>
>>>>> wrote:
>>>>>>
>>>>>> On 06/29/2010 04:09 AM, Jason Garrett-Glaser wrote:
>>>>>>>
>>>>>>> Here's a rough guide to what's done and what needs to be done before
>>>>>>> ffmpeg's VP8 decoder is as fast as a politician running away from an
>>>>>>> ethics committee.
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>> C:
>>>>>>>
>>>>>>> Fully convert vp5/6/7/8 arithmetic coder to bytestream: eliminate the
>>>>>>> looped renormalization.
>>>>>>
>>>>>> Like attached?
>>>>>
>>>>> We should try to reuse the h264 table if possible, IMO.
>>>>
>>>> If we are talking about the same table (ff_h264_norm_shift*),
>>>> it can not be used as is,
>>>> I think these are the options:
>>>>
>>>> 1. shift = 7 - av_log2_16bit(c->high);
>>>>
>>>> 2. shift = 7 - ff_log2_tab[c->high];
>>>>
>>>> 3. shift = ff_h264_norm_shift_old[c->high] + !!c->high;
>>>
>>> 3. shift = ff_h264_norm_shift_old[c->high] + 1;
>>>
>>> as c->high should never become zero.
>>
>> This sounds like the best option: 1) is 3 ops on x86 minimum, as is
>> 2). ?3) is at most two ops.
>
> But a clz is often faster than a memory load. ?Also don't forget you
> have to load the base address of the table from somewhere on
> everything but x86 (and there too with PIC).

Then we should do an ifdef of some sort to get optimal behavior in all
situations.

Dark Shikari



More information about the ffmpeg-devel mailing list