[FFmpeg-devel] [HACK] 50% faster H.264 decoding
Thu Aug 19 00:28:45 CEST 2010
On Wed, Aug 18, 2010 at 12:42:11PM -0400, Ronald S. Bultje wrote:
> On Tue, Aug 17, 2010 at 1:35 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Tue, Aug 17, 2010 at 11:01:03AM -0400, Ronald S. Bultje wrote:
> >> On Mon, Aug 16, 2010 at 6:40 PM, Jason Garrett-Glaser
> >> <darkshikari at gmail.com> wrote:
> >> > On Mon, Aug 16, 2010 at 3:35 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> >> >> Hi,
> >> >>
> >> >> On Wed, Aug 11, 2010 at 5:32 PM, Jason Garrett-Glaser
> >> >> <darkshikari at gmail.com> wrote:
> >> >>> 13. Use MPEG-2 MC for chroma MC, since we know that MVs are
> >> >>> fullpel-only. ?Simplify edge emulation stuff accordingly too.
> >> >>
> >> >> Does h264 chroma subpel actually use a memcpy shortcut if it's
> >> >> fullpel? I don't remember exactly, but I don't think it has such a
> >> >> shortcut for chroma, only for luma.
> >> >
> >> > It doesn't. ?It should at least have a shortcut for the 0,0 motion
> >> > vector because its very high probability (relative to other fullpel
> >> > motion vectors that result in no chroma interpolation). ?For other
> >> > cases, it might or might not be worthwhile to add a branch in the asm
> >> > to the 1D-only case.
> >> Attached sets up framework for that. The  functions can be copied
> >> straight from VP8 (they are pixel_copy functions, with very fast
> >> aligned implementations for all relevant archs) and others, and should
> >> make VC-1, RV3/4, h264, H264/MPEG etc. significantly faster for the
> >> MVxy==0 case. The / functions are probably going to be faster as
> >> well but that would need some testing to see how big the effect is.
> >>  is the function as-is now, which should obviously stay the way it
> >> is.
> >> Michael, OK to apply this? It's mostly just changing all kind of files
> > if its not slower ...
> Same speed. Attached is an updated version that fixes a bug in one of
> the fate samples where mx gets changed and thus we called the wrong
> I've tested this version with a semi-finished patch that splits up the
> h264 chroma MC functions (particularly the mc8 ones) into smaller
> ones, thus having cleaner (and unbranched) handling of mx==0/my==0.
> This will remove most (if not all) of the branching, which might give
> a minor speedup, and also removes a little duplicate code (in the
> binary, not source), e.g. the fullpel handling between
> mmx/3dnow/mmx2/ssse3 rv40/h264/vc1 mc8 is identical (it's all
> put_pixels8_mmx) and only needs a single function. I'm only doing this
> for the C and x86 ones because I can't test any of the others.
> After that's done, I plan to do a third patch which will add fullpel
> or 1D-filter versions for mc4/mc2 as well, which should actually
> provide a speedup for code on our desktops, as we saw for Jason's
> arm/dsputil_init_neon.c | 32 ++++++++++---
> cavs.c | 13 ++---
> dsputil.c | 40 +++++++++++++---
> dsputil.h | 12 ++--
> h264.c | 24 +++++----
> mpegvideo.c | 28 ++++++-----
> ppc/h264_altivec.c | 20 ++++++--
> rv34.c | 9 ++-
> rv40dsp.c | 20 ++++++--
> sh4/dsputil_align.c | 30 +++++++++---
> vc1dec.c | 33 +++++++------
> vp6.c | 6 +-
> x86/dsputil_mmx.c | 118 +++++++++++++++++++++++++++++++++++++-----------
> 13 files changed, 272 insertions(+), 113 deletions(-)
> 183027123a1213b2e037504a01d87c9c0678c1db h264-chroma-mvzero-shortcut.patch
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
When the tyrant has disposed of foreign enemies by conquest or treaty, and
there is nothing more to fear from them, then he is always stirring up
some war or other, in order that the people may require a leader. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 190 bytes
Desc: Digital signature
More information about the ffmpeg-devel