[FFmpeg-devel] [PATCH] faster vp6 decoding

Sebastien Lucas sebastien.lucas
Mon Feb 9 13:17:56 CET 2009


On Mon, Feb 9, 2009 at 10:27 AM, Jason Garrett-Glaser
<darkshikari at gmail.com> wrote:
> +    "punpcklbw %%mm7, %%mm0\n\t"                                \
> +    "punpcklbw %%mm7, %%mm1\n\t"                                \
> +    "punpckhbw %%mm7, %%mm3\n\t"                                \
> +    "punpckhbw %%mm7, %%mm4\n\t"                                \
> +    "pmullw  0(%2), %%mm0\n\t" /* src[x-8 ] * biweight [0] */   \
> +    "pmullw  8(%2), %%mm1\n\t" /* src[x   ] * biweight [1] */   \
> +    "pmullw  0(%2), %%mm3\n\t" /* src[x-8 ] * biweight [0] */   \
> +    "pmullw  8(%2), %%mm4\n\t" /* src[x   ] * biweight [1] */   \
> +    "paddw %%mm1, %%mm0\n\t"                                    \
> +    "paddw %%mm4, %%mm3\n\t"                                    \
>
> This can be done faster with pmaddubsw (SSSE3-only, but worth making
> another version surely).  Worthwhile if you make an SSE version.
> Works by interleaving the weights, allowing you to avoid the unpacks,
> use only two multiplies, and avoid the adds, too, I think.  If I'm
> right, that makes the entire thing quite a bit less than half the
> instructions.
>

Thanks for the idea.

But outside of my work laptop, my newest computer has not even SSE2 so
I'll let anybody else do the job.




More information about the ffmpeg-devel mailing list