[FFmpeg-devel] [PATCH] faster vp6 decoding
Jason Garrett-Glaser
darkshikari
Mon Feb 9 10:27:58 CET 2009
+ "punpcklbw %%mm7, %%mm0\n\t" \
+ "punpcklbw %%mm7, %%mm1\n\t" \
+ "punpckhbw %%mm7, %%mm3\n\t" \
+ "punpckhbw %%mm7, %%mm4\n\t" \
+ "pmullw 0(%2), %%mm0\n\t" /* src[x-8 ] * biweight [0] */ \
+ "pmullw 8(%2), %%mm1\n\t" /* src[x ] * biweight [1] */ \
+ "pmullw 0(%2), %%mm3\n\t" /* src[x-8 ] * biweight [0] */ \
+ "pmullw 8(%2), %%mm4\n\t" /* src[x ] * biweight [1] */ \
+ "paddw %%mm1, %%mm0\n\t" \
+ "paddw %%mm4, %%mm3\n\t" \
This can be done faster with pmaddubsw (SSSE3-only, but worth making
another version surely). Worthwhile if you make an SSE version.
Works by interleaving the weights, allowing you to avoid the unpacks,
use only two multiplies, and avoid the adds, too, I think. If I'm
right, that makes the entire thing quite a bit less than half the
instructions.
Dark Shikari
More information about the ffmpeg-devel
mailing list