[FFmpeg-devel] [PATCH] faster vp6 decoding

Jason Garrett-Glaser darkshikari
Wed Feb 11 18:03:35 CET 2009


On Wed, Feb 11, 2009 at 7:28 AM, Zuxy Meng <zuxy.meng at gmail.com> wrote:
> Hi,
>
> 2009/2/9 Jason Garrett-Glaser <darkshikari at gmail.com>:
>> +    "punpcklbw %%mm7, %%mm0\n\t"                                \
>> +    "punpcklbw %%mm7, %%mm1\n\t"                                \
>> +    "punpckhbw %%mm7, %%mm3\n\t"                                \
>> +    "punpckhbw %%mm7, %%mm4\n\t"                                \
>> +    "pmullw  0(%2), %%mm0\n\t" /* src[x-8 ] * biweight [0] */   \
>> +    "pmullw  8(%2), %%mm1\n\t" /* src[x   ] * biweight [1] */   \
>> +    "pmullw  0(%2), %%mm3\n\t" /* src[x-8 ] * biweight [0] */   \
>> +    "pmullw  8(%2), %%mm4\n\t" /* src[x   ] * biweight [1] */   \
>> +    "paddw %%mm1, %%mm0\n\t"                                    \
>> +    "paddw %%mm4, %%mm3\n\t"                                    \
>>
>> This can be done faster with pmaddubsw (SSSE3-only, but worth making
>> another version surely).
>
> Sure but that would require weights to be stored as arrays of int8_t
> instead of int16_t?

Nothing a punpcklbw can't solve, unless the weights actually require
more than 7 bits of precision.

>> Worthwhile if you make an SSE version.
>
> SSE2?

Of course.

Dark Shikari




More information about the ffmpeg-devel mailing list