[FFmpeg-devel] VP8 decoder optimization status

Tue Jun 29 04:09:04 CEST 2010

Here's a rough guide to what's done and what needs to be done before
ffmpeg's VP8 decoder is as fast as a politician running away from an
ethics committee.

x86 asm:

Done:
6-tap motion compensation
bilinear motion compensation
dc-only iDCT
luma dc WHT
i16x16 intra pred
i4x4 intra pred (V, DC, TM)

TODO:
Normal loopfilter
Simple loopfilter
regular iDCT (patch by Ronald is on ML)
i4x4 intra pred (DDL, DDR, VR, HD, VL, HU)

ARM/PPC asm: nothing done yet

C:

Fully convert vp5/6/7/8 arithmetic coder to bytestream: eliminate the
looped renormalization.
Port all of x264's and ffh264's optimizations once the above is done
(since they'll now be relevant).
Convert vp5/6/7/8 arithmetic coder to use a larger cache size (maybe
16-bit or 32-bit?) for fewer bytestream reads.
Optimize decode_block_coeffs (it can surely be made faster).
Improve edge emulation handling (we currently have the worst of both
worlds -- we require padding on the edges, yet we use the slow
ff_emulated_edge_mc -- we should pick one method or the other).
Optimize cache handling (mvs and nnz).
Optimize MV prediction.
Probably lots of other stuff I haven't thought of, feel free to
contribute ideas.

The current top priority for x86 speed is by far and away the Normal
loopfilter -- it's something like 60-70%+ of the total time, since
we've SIMD-optimized nearly everything else of note.

Dark Shikari