[FFmpeg-devel] Pipeline: H.264 speed improvements

Jason Garrett-Glaser darkshikari
Wed Dec 24 00:02:40 CET 2008


On Tue, Dec 23, 2008 at 3:41 PM, M?ns Rullg?rd <mans at mansr.com> wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
>
>> On Tue, Dec 23, 2008 at 04:08:26AM -0500, Jason Garrett-Glaser wrote:
>>> I've put together a list of all the possible speed improvements I can
>>> see, including both some obvious ones and non-obvious ones.  If you're
>>> interested in implementing anything here, say so to make sure your
>>> work isn't duplicated by Michael or I.  Also feel free to discuss some
>>> of the more nutty ideas, like the VLC table, or tell me that I'm wrong
>>> about something.
>>>
>>> Non-assembly stuff:
>> [...]
>>> av_log2 is unnecessarily powerful for use in h264.c.  All signed
>>> golomb values in H.264 fit in 16-bit, and all unsigned golomb values
>>> other than headers fit in 8-bit.  Thus all ordinary unsigned golomb
>>> code reads can literally be put in a 256-byte VLC table and replaced
>>> with a single array lookup.
>>
>> it may be that all ue golomb coded values are <256 outside the headers,
>> though even this seems wrong for mb_skip_run the way i understand the spec.
>> But a value of 255 corresponds to a 15bit long vlc code.
>> a 256 (or 128) entry LUT limits one to values 0-15 512 (or 1024) to 0-31
>>
>> Now there are surely a few left that are that small but thats far from
>> all non header values.
>
> av_log2() can be trivially implemented on most CPUs using a count
> leading zeros instruction.  That should be even faster than a table.
> On ARM this instruction takes one cycle.

For ARM this can be special-cased.  Intel CPUs have a 1-3 cycle CLZ
(depends on the CPU) but on AMD chips this can cost >10 cycles, so a
table is generally preferred on x86.

> it may be that all ue golomb coded values are <256 outside the headers,
> though even this seems wrong for mb_skip_run the way i understand the spec.
> But a value of 255 corresponds to a 15bit long vlc code.
> a 256 (or 128) entry LUT limits one to values 0-15 512 (or 1024) to 0-31
>
> Now there are surely a few left that are that small but thats far from
> all non header values.

Ah yes, you're right.  Skip runs can be larger.

I know in x264 that we use a 256-size-LUT golomb code writer for
everything not in encoder.c/set.c, and encoder.c/set.c covers skip
runs and headers.

Dark Shikari




More information about the ffmpeg-devel mailing list