[FFmpeg-devel] [PATCH] faster vp6 decoding
Sebastien Lucas
sebastien.lucas
Mon Feb 9 13:17:56 CET 2009
On Mon, Feb 9, 2009 at 10:27 AM, Jason Garrett-Glaser
<darkshikari at gmail.com> wrote:
> + "punpcklbw %%mm7, %%mm0\n\t" \
> + "punpcklbw %%mm7, %%mm1\n\t" \
> + "punpckhbw %%mm7, %%mm3\n\t" \
> + "punpckhbw %%mm7, %%mm4\n\t" \
> + "pmullw 0(%2), %%mm0\n\t" /* src[x-8 ] * biweight [0] */ \
> + "pmullw 8(%2), %%mm1\n\t" /* src[x ] * biweight [1] */ \
> + "pmullw 0(%2), %%mm3\n\t" /* src[x-8 ] * biweight [0] */ \
> + "pmullw 8(%2), %%mm4\n\t" /* src[x ] * biweight [1] */ \
> + "paddw %%mm1, %%mm0\n\t" \
> + "paddw %%mm4, %%mm3\n\t" \
>
> This can be done faster with pmaddubsw (SSSE3-only, but worth making
> another version surely). Worthwhile if you make an SSE version.
> Works by interleaving the weights, allowing you to avoid the unpacks,
> use only two multiplies, and avoid the adds, too, I think. If I'm
> right, that makes the entire thing quite a bit less than half the
> instructions.
>
Thanks for the idea.
But outside of my work laptop, my newest computer has not even SSE2 so
I'll let anybody else do the job.
More information about the ffmpeg-devel
mailing list