[FFmpeg-devel] [PATCH] faster vp6 decoding
Thu Feb 12 14:08:04 CET 2009
2009/2/12 Sebastien Lucas <sebastien.lucas at gmail.com>:
> On Thu, Feb 12, 2009 at 12:29 AM, Aurelien Jacobs <aurel at gnuage.org> wrote:
>> Nice. I fixed it so that it works on x86_64 and I cleaned it up.
>> It works but has some small visible artifacts.
>> It would be great if you could fix attached patch so that it gives
>> bitexact result with:
>> ffmpeg -i sample.flv -f framecrc out.crc
> I fixed the code (it's attached) but I didn't know how if it's clean
> enough. Zuxy was using ff_pw_64 (in fact the MMX code should use it)
> which is only a int64 and with his SSE2 code we need an int128.
> So I added the round_64 variable in vp6dsp_sse2.c. I also fixed the
> overflow/sign problems.
Just expand ff_pw_64 similar to ff_pw_28 w/o the need to introduce
another var. Of course you have to fix some type confliction in
dsputil_mmx.h and other places but that is trivial.
> I can confirm it's bitexact, but I can not test the speed (testing on
> a virtual machine).
I just tested on my Pentium M and SSE2 version is about 12% faster.
BTW you can optimize for x64 a bit more by reading ff_pw_64 into xmm8
so as to avoid a memory read inside the loop. :-)
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6
More information about the ffmpeg-devel