[FFmpeg-devel] [PATCH] faster vp6 decoding
Thu Feb 12 14:11:24 CET 2009
2009/2/12 Zuxy Meng <zuxy.meng at gmail.com>:
>> I fixed the code (it's attached) but I didn't know how if it's clean
>> enough. Zuxy was using ff_pw_64 (in fact the MMX code should use it)
>> which is only a int64 and with his SSE2 code we need an int128.
>> So I added the round_64 variable in vp6dsp_sse2.c. I also fixed the
>> overflow/sign problems.
> Just expand ff_pw_64 similar to ff_pw_28 w/o the need to introduce
> another var. Of course you have to fix some type confliction in
> dsputil_mmx.h and other places but that is trivial.
>> I can confirm it's bitexact, but I can not test the speed (testing on
>> a virtual machine).
> I just tested on my Pentium M and SSE2 version is about 12% faster.
I mean the time spent in ff_vp6_filter_diag4_mmx/sse2() themselves,
not the overall decoding time of course.
> BTW you can optimize for x64 a bit more by reading ff_pw_64 into xmm8
> so as to avoid a memory read inside the loop. :-)
> Beauty is truth,
> While truth is beauty.
> PGP KeyID: E8555ED6
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6
More information about the ffmpeg-devel