[FFmpeg-devel] [PATCH] faster vp6 decoding
Jason Garrett-Glaser
darkshikari
Wed Feb 11 18:03:35 CET 2009
On Wed, Feb 11, 2009 at 7:28 AM, Zuxy Meng <zuxy.meng at gmail.com> wrote:
> Hi,
>
> 2009/2/9 Jason Garrett-Glaser <darkshikari at gmail.com>:
>> + "punpcklbw %%mm7, %%mm0\n\t" \
>> + "punpcklbw %%mm7, %%mm1\n\t" \
>> + "punpckhbw %%mm7, %%mm3\n\t" \
>> + "punpckhbw %%mm7, %%mm4\n\t" \
>> + "pmullw 0(%2), %%mm0\n\t" /* src[x-8 ] * biweight [0] */ \
>> + "pmullw 8(%2), %%mm1\n\t" /* src[x ] * biweight [1] */ \
>> + "pmullw 0(%2), %%mm3\n\t" /* src[x-8 ] * biweight [0] */ \
>> + "pmullw 8(%2), %%mm4\n\t" /* src[x ] * biweight [1] */ \
>> + "paddw %%mm1, %%mm0\n\t" \
>> + "paddw %%mm4, %%mm3\n\t" \
>>
>> This can be done faster with pmaddubsw (SSSE3-only, but worth making
>> another version surely).
>
> Sure but that would require weights to be stored as arrays of int8_t
> instead of int16_t?
Nothing a punpcklbw can't solve, unless the weights actually require
more than 7 bits of precision.
>> Worthwhile if you make an SSE version.
>
> SSE2?
Of course.
Dark Shikari
More information about the ffmpeg-devel
mailing list