[FFmpeg-devel] VP8 decoder optimization status
Måns Rullgård
mans
Wed Jun 30 22:58:14 CEST 2010
Jason Garrett-Glaser <darkshikari at gmail.com> writes:
> 2010/6/30 M?ns Rullg?rd <mans at mansr.com>:
>> Jason Garrett-Glaser <darkshikari at gmail.com> writes:
>>
>>> On Wed, Jun 30, 2010 at 1:19 PM, Stefan Gehrer <stefan.gehrer at gmx.de> wrote:
>>>> On 06/30/2010 10:15 PM, Stefan Gehrer wrote:
>>>>>
>>>>> On 06/30/2010 08:54 PM, Jason Garrett-Glaser wrote:
>>>>>>
>>>>>> On Wed, Jun 30, 2010 at 8:55 AM, Stefan Gehrer<stefan.gehrer at gmx.de>
>>>>>> wrote:
>>>>>>>
>>>>>>> On 06/29/2010 04:09 AM, Jason Garrett-Glaser wrote:
>>>>>>>>
>>>>>>>> Here's a rough guide to what's done and what needs to be done before
>>>>>>>> ffmpeg's VP8 decoder is as fast as a politician running away from an
>>>>>>>> ethics committee.
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>> C:
>>>>>>>>
>>>>>>>> Fully convert vp5/6/7/8 arithmetic coder to bytestream: eliminate the
>>>>>>>> looped renormalization.
>>>>>>>
>>>>>>> Like attached?
>>>>>>
>>>>>> We should try to reuse the h264 table if possible, IMO.
>>>>>
>>>>> If we are talking about the same table (ff_h264_norm_shift*),
>>>>> it can not be used as is,
>>>>> I think these are the options:
>>>>>
>>>>> 1. shift = 7 - av_log2_16bit(c->high);
>>>>>
>>>>> 2. shift = 7 - ff_log2_tab[c->high];
>>>>>
>>>>> 3. shift = ff_h264_norm_shift_old[c->high] + !!c->high;
>>>>
>>>> 3. shift = ff_h264_norm_shift_old[c->high] + 1;
>>>>
>>>> as c->high should never become zero.
>>>
>>> This sounds like the best option: 1) is 3 ops on x86 minimum, as is
>>> 2). ?3) is at most two ops.
>>
>> But a clz is often faster than a memory load. ?Also don't forget you
>> have to load the base address of the table from somewhere on
>> everything but x86 (and there too with PIC).
>
> Then we should do an ifdef of some sort to get optimal behavior in all
> situations.
I guess we need a macro.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list