[FFmpeg-devel] [PATCH] faster vp6 decoding

Thu Feb 12 00:29:26 CET 2009

Zuxy Meng wrote:

> Hi,
> 
> 2009/2/9 Jason Garrett-Glaser <darkshikari at gmail.com>:
> > +    "punpcklbw %%mm7, %%mm0\n\t"                                \
> > +    "punpcklbw %%mm7, %%mm1\n\t"                                \
> > +    "punpckhbw %%mm7, %%mm3\n\t"                                \
> > +    "punpckhbw %%mm7, %%mm4\n\t"                                \
> > +    "pmullw  0(%2), %%mm0\n\t" /* src[x-8 ] * biweight [0] */   \
> > +    "pmullw  8(%2), %%mm1\n\t" /* src[x   ] * biweight [1] */   \
> > +    "pmullw  0(%2), %%mm3\n\t" /* src[x-8 ] * biweight [0] */   \
> > +    "pmullw  8(%2), %%mm4\n\t" /* src[x   ] * biweight [1] */   \
> > +    "paddw %%mm1, %%mm0\n\t"                                    \
> > +    "paddw %%mm4, %%mm3\n\t"                                    \
> >
> > This can be done faster with pmaddubsw (SSSE3-only, but worth making
> > another version surely).
> 
> Sure but that would require weights to be stored as arrays of int8_t
> instead of int16_t?
> 
> > Worthwhile if you make an SSE version.
> 
> SSE2?
> 
> > Works by interleaving the weights, allowing you to avoid the unpacks,
> > use only two multiplies, and avoid the adds, too, I think.  If I'm
> > right, that makes the entire thing quite a bit less than half the
> > instructions.
> 
> I tried something like below and it's about 15% faster on my Pentium
> M. The speed up should be more prominent on modern CPUs with 128 bit
> FADD unit:
> 
> [...some asm code...]

Nice. I fixed it so that it works on x86_64 and I cleaned it up.
It works but has some small visible artifacts.
It would be great if you could fix attached patch so that it gives
bitexact result with:
  ffmpeg -i sample.flv -f framecrc out.crc

Aurel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vp6dsp_sse2.diff
Type: text/x-patch
Size: 8674 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090212/acd044f1/attachment.bin>