[FFmpeg-devel] Fw: [foms] Paper submissions to LCA

Måns Rullgård mans
Fri Jul 17 00:38:12 CEST 2009


Jason Garrett-Glaser <darkshikari at gmail.com> writes:

> 2009/7/16 M?ns Rullg?rd <mans at mansr.com>:
>> Jason Garrett-Glaser <darkshikari at gmail.com> writes:
>>
>>> On Thu, Jul 16, 2009 at 6:37 AM, Mike Melanson<mike at multimedia.cx> wrote:
>>>> Frank Barchard wrote:
>>>>>
>>>>> I don't feel qualified to speak for ffmpeg, but 2 potential topics
>>>>> would be Chrome, and subtitles:
>>>>>
>>>>> 1. The Chrome topic, because Chrome/Chromium use ffmpeg to implement
>>>>> html5 video tag. ?We could talk about whats great, and not so great,
>>>>> about using ffmpeg, which would hopefully lead to improvements.
>>>>
>>>> That's easy enough: The H.264 decoder is great; the Theora decoder sucks. :)
>>>
>>> The H.264 decoder isn't great because CoreAVC is a crapton faster,
>>> primarily due to better architecture, despite the fact that ffmpeg's
>>> assembly is significantly superior.
>>
>> Could we improve this?
>
> Yes.  Doing the following would make ffmpeg faster than CoreAVC for
> progressive decoding (interlaced/MBAFF is harder, and I don't want to
> get into that).  Some of these would be useful for x264, but I don't
> do them because they would only help at the fastest encoding modes
> (and I don't want to redesign the encoder around such useless modes):
>
> 1.  Template the code twice, once for CABAC, once for CAVLC.
> Interleave entropy decoding and MC/idct.  This means, for example,
> decoding an MV, and immediately performing motion compensation with
> that MV.

This is unfortunately the opposite of what is needed to do part of the
decoding on a DSP or other coprocessor.

> 2.  Write paranoid-schizophrenic entropy decoder; separate load_bits
> and get_bits into two functions and only call load_bits when one knows
> that the bit buffer needs to be reloaded.

Isn't this what some of the bitstream readers already do?

> 3.  Use a constant-stride instead of variable stride (a'la x264).  Use
> ring buffers instead of full-frame data for syntax elements.  Never
> load any pixel data from the frame itself, only from the ring buffer
> and from the left side of the previous macroblock to fill the right
> side of the current one, and so forth.
> 4.  Frame-based multithreading (obviously).
> 5.  Eliminate fill_caches.  Split it into a few separate functions,
> which are only called when needed.  For example, caching intra pred
> data is only called before decoding an i4x4 macroblock, after the
> macroblock header is parsed.

This function always ranks high when profiling, so optimising it seems
like a good idea.

> 6.  Use a better compiler.  MSVC gave me a 10% performance boost on
> CoreAVC; this might just be because it was optimized from the ground
> up for it, I don't know.  Maybe ICC with profiling will do better for
> ffmpeg.

On that topic, does anyone know why so many FATE tests are failing
with icc?

> 7.  Template everything you can get your hands on.  Motion
> compensation functions should be templated for weighted pred, implicit
> weighted pred, bipred, non-bipred, etc.  Decoding functions should be
> templated based on frametype (I, P, B).
> 8.  Borrow every bit of assembly you can get your hands on from x264
> to squeeze out as much performance as possible.
>
> Many of these changes would involve a great deal of refactoring, both
> in h264.c and dsputil.  Some would probably be completely impossible
> to get past a patch review, particularly 2).

What does that say about the review process?

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list