[FFmpeg-devel] Fw: [foms] Paper submissions to LCA
Måns Rullgård
mans
Fri Jul 17 00:38:12 CEST 2009
Jason Garrett-Glaser <darkshikari at gmail.com> writes:
> 2009/7/16 M?ns Rullg?rd <mans at mansr.com>:
>> Jason Garrett-Glaser <darkshikari at gmail.com> writes:
>>
>>> On Thu, Jul 16, 2009 at 6:37 AM, Mike Melanson<mike at multimedia.cx> wrote:
>>>> Frank Barchard wrote:
>>>>>
>>>>> I don't feel qualified to speak for ffmpeg, but 2 potential topics
>>>>> would be Chrome, and subtitles:
>>>>>
>>>>> 1. The Chrome topic, because Chrome/Chromium use ffmpeg to implement
>>>>> html5 video tag. ?We could talk about whats great, and not so great,
>>>>> about using ffmpeg, which would hopefully lead to improvements.
>>>>
>>>> That's easy enough: The H.264 decoder is great; the Theora decoder sucks. :)
>>>
>>> The H.264 decoder isn't great because CoreAVC is a crapton faster,
>>> primarily due to better architecture, despite the fact that ffmpeg's
>>> assembly is significantly superior.
>>
>> Could we improve this?
>
> Yes. Doing the following would make ffmpeg faster than CoreAVC for
> progressive decoding (interlaced/MBAFF is harder, and I don't want to
> get into that). Some of these would be useful for x264, but I don't
> do them because they would only help at the fastest encoding modes
> (and I don't want to redesign the encoder around such useless modes):
>
> 1. Template the code twice, once for CABAC, once for CAVLC.
> Interleave entropy decoding and MC/idct. This means, for example,
> decoding an MV, and immediately performing motion compensation with
> that MV.
This is unfortunately the opposite of what is needed to do part of the
decoding on a DSP or other coprocessor.
> 2. Write paranoid-schizophrenic entropy decoder; separate load_bits
> and get_bits into two functions and only call load_bits when one knows
> that the bit buffer needs to be reloaded.
Isn't this what some of the bitstream readers already do?
> 3. Use a constant-stride instead of variable stride (a'la x264). Use
> ring buffers instead of full-frame data for syntax elements. Never
> load any pixel data from the frame itself, only from the ring buffer
> and from the left side of the previous macroblock to fill the right
> side of the current one, and so forth.
> 4. Frame-based multithreading (obviously).
> 5. Eliminate fill_caches. Split it into a few separate functions,
> which are only called when needed. For example, caching intra pred
> data is only called before decoding an i4x4 macroblock, after the
> macroblock header is parsed.
This function always ranks high when profiling, so optimising it seems
like a good idea.
> 6. Use a better compiler. MSVC gave me a 10% performance boost on
> CoreAVC; this might just be because it was optimized from the ground
> up for it, I don't know. Maybe ICC with profiling will do better for
> ffmpeg.
On that topic, does anyone know why so many FATE tests are failing
with icc?
> 7. Template everything you can get your hands on. Motion
> compensation functions should be templated for weighted pred, implicit
> weighted pred, bipred, non-bipred, etc. Decoding functions should be
> templated based on frametype (I, P, B).
> 8. Borrow every bit of assembly you can get your hands on from x264
> to squeeze out as much performance as possible.
>
> Many of these changes would involve a great deal of refactoring, both
> in h264.c and dsputil. Some would probably be completely impossible
> to get past a patch review, particularly 2).
What does that say about the review process?
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list