[FFmpeg-devel] [PATCH] mpeg2: fix block_last_index when mismatch control modifies last coeff

Tue Jun 22 06:08:24 CEST 2010

On Mon, Jun 21, 2010 at 8:17 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Mon, Jun 21, 2010 at 03:30:32PM -0700, Jason Garrett-Glaser wrote:
> [...]
>> I'm trying to merge parts of a local changeset I have that makes the
>> FLV decoder 30-40% faster overall (many parts may apply to mpegvideo
>> overall). ?Some of these parts are unmergable, but others are quite
>> mergable.
>
> please merge what is mergeable, this is greatly appreciated work!
> also iam interrested in the unmergable changes, why are they unmergable?
> are they public somewhere? This thread shows nicely how talking about
> code that only some have seen can lead to confusion and flames when
> discussing it ....

Here's the changeset.  The purpose of this was to get realtime,
full-screen video playback on the iPad at 30fps.  The iPad has an
extraordinarily slow display driver, which is synchronous (you can't
decode video while calling display on a texture) and takes up to
~8-9ms per frame.  Combined with CELT audio, this leaves about ~22ms
per 1024x768 frame for video decoding -- a massive challenge on a 1Ghz
Cortex chip, especially in very high motion at bitrates like ~6mbps.

I intentionally tore up the entire MPEG decoder with the intention
that only FLV work.  Accordingly, keep in mind that this is a GIGANTIC
UGLY HACK and you'd be insane to hold me accountable for utterly
insanely ugly any of this is.  But some portion of the changes may be
mergable.

 A short summary (not complete, I don't remember everything):

1.  Use idct_dc whenever possible, for both inter and intra.  Add a
NEON idct_dc function written by Mans.
2.  Move dequantization into entropy decoding, to avoid costly calls
to the SIMD dequant function.
3.  Inline the flv escape code decoder -- gave a huge benefit.
4.  Inline the h263 block decode.
5.  Remove every single last case everywhere that isn't relevant to
FLV decoding (a huge amount of this gain could be gotten via
templatization, IMO).  Yes, some of the removals are utterly pointless
and was just me deleting code to make my editing space smaller.
6.  Eliminate the mv caching code: do it all in bitstream decoding,
even for 16x16 blocks.
7.  Change the mv cache to 8-bit (would break everything with unrestricted mvs).
8.  Use write-combining in some places.
9.  Inline the values from the h263 table directly into the code --
e.g. 102 instead of rl->n, and make the tables that aren't
runtime-generated static const in the file to avoid pointer
dereferences.  Generally optimize and clean up the decode_block
function (breaking everything non-flv obviously).

Overall, this basically eliminated all overhead outside of residual
decoding, idct, and mc.  MC was a tiny part of overall decode, and
residual decoding was probably 30-40%+ faster.  IDCT got a ton faster
with the addition of the idct_dc, which triggers in a shockingly large
percentage of total cases.

It's, amazingly, still bitexact, as far as I measured.  But of course
it breaks everything else.

Dark Shikari
-------------- next part --------------
A non-text attachment was scrubbed...
Name: destroy_mpeg_decoder5.diff
Type: application/octet-stream
Size: 107035 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100621/049cf18f/attachment.obj>