[FFmpeg-devel] [PATCH] H.264: Faster DC handling (idct_dc and more)
Thu Jan 29 21:03:14 CET 2009
On Thu, Jan 29, 2009 at 11:54 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Thu, Jan 29, 2009 at 03:46:56AM -0800, Jason Garrett-Glaser wrote:
>> On Mon, Jan 26, 2009 at 5:32 PM, Jason Garrett-Glaser
>> <darkshikari at gmail.com> wrote:
>> > The first of these patches modifies ffmpeg to implement x264-style
>> > scan8; this means storing nnz from luma/chroma DC in scan8. This is
>> > extremely useful because it means one can check with a single if
>> > whether luma/chroma DC exists. Normally, you can only do it in the
>> > case of CABAC where the data is stored in the cbp. This also
>> > simplifies decode_residual a lot and will surely be useful for future
>> > optimizations.
>> > The second of these patches ports an 8x8 idct_dc function from x264
>> > which I just wrote (so new, in fact, it isn't in x264 yet!). This one
>> > is LGPL. The entire patch drops chroma idct/dequant time from ~268
>> > clocks to ~205 clocks on a Core 2 Conroe. Note that there *are*
>> > warnings relating to pointers in this patch--suggestions welcome on
>> > the best way to clean them up without making the source messy.
>> > And yes, I tested adding the ifs I used in the DC-only section to the
>> > other chroma decoding section--it makes it slower, probably because
>> > you almost never have chroma AC without chroma DC.
>> > Dark Shikari
>> Did this get lost in the flood of discussion about releases or something? ;)
> no it got lost in my lazyness, ill look at it of course, just have many other
> patches to look at as well and i tend to look at easy ones first
> (h264 is never easy :)
Note that when you're done I have some updated asm for the idct_dc
section courtesy of more work I've done with x264 on this note.
In addition, as part of the largest single patch I've ever written for
x264 (>70KB), I have an idct_dc for an entire i16x16 block, designed
to handle those pesky i16x16 blocks with no AC coefficients (on a Core
i7 it runs in 31 clocks). This one is also easy to detect on a
bitstream level, since i16x16 coded with "no CBP" implies no AC, but
DC still allowed. This'll probably go in a separate patch.
More information about the ffmpeg-devel