[FFmpeg-devel] [FFmpeg-devel-irc] IRC log for 2010-02-19
Jason Garrett-Glaser
darkshikari
Mon Feb 22 00:06:43 CET 2010
On Sun, Feb 21, 2010 at 6:57 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Sat, Feb 20, 2010 at 12:00:54AM +0000, irc at mansr.com wrote:
>> [00:00:16] <mru> if speed matters you should use asm
>> [00:00:27] <Dark_Shikari> we only have inline asm on x86 atm
>> [00:00:47] <mru> sometimes inline asm is the best solution
>> [00:01:17] <Dark_Shikari> hmm. michael's idea seems to hurt in x264.
>> [00:02:02] <Dark_Shikari> in fact, michael's idea would hurt in ffmpeg if ffmpeg had inline SIMD like x264 did
>> [00:02:05] <Dark_Shikari> for that code
>> [00:02:11] <Dark_Shikari> we use simd for the following
>> [00:02:12] <Dark_Shikari> MIN(((x+28)*2184)>>16,2) = (x>2) + (x>32)
>> [00:02:20] <Dark_Shikari> on two values at once
>> [00:02:28] <Dark_Shikari> left side is simd, right side is C
>
> any volunteers who would send a patch?
Note that since, I've changed that code locally due to some
inspiration from your patch ;)
Here's the current asm, which calculates (x>2)+(x>32) for two values
at once. I don't think it's much better than C anymore; the main
advantage before was that it saved 2 abs() calls, but your idea
eliminates the need for that.
static const uint64_t pb_2 = 0x0202020202020202ULL;
static const uint64_t pb_32 = 0x2020202020202020ULL;
int amvd;
asm(
"movd %1, %%mm0 \n"
"movd %2, %%mm1 \n"
"paddb %%mm1, %%mm0 \n"
"pxor %%mm2, %%mm2 \n"
"movq %%mm0, %%mm1 \n"
"pcmpgtb %3, %%mm0 \n"
"pcmpgtb %4, %%mm1 \n"
"psubb %%mm0, %%mm2 \n"
"psubb %%mm1, %%mm2 \n"
"movd %%mm2, %0 \n"
:"=r"(amvd)
:"m"(M16( mvdleft )),"m"(M16( mvdtop )),
"m"(pb_2),"m"(pb_32)
);
Note how the input is bytes (!). Here's the trick: MVD values only
have to be 0 to 33; any larger value tells us nothing. Maybe there's
some scaling that goes on with MBAFF, but even then it's only 0-65.
As a result, you can store MVD values as uint8_ts, saving enormous
amounts of memory and cache and making fill_rectangle faster.
Obviously this requires a little bit of extra clipping, but my
benchmarks in x264 show that it's worth it there.
This change is probably vastly more useful than the above asm (which I
make available under LGPL in case anyone cares, but it's probably
near-useless once the other changes are done).
Dark Shikari
More information about the ffmpeg-devel
mailing list