[FFmpeg-devel] [PATCH] MMX2/SSSE3 VC1 loop filter
David Conrad
lessen42
Mon Jul 5 23:19:58 CEST 2010
On Jul 5, 2010, at 5:02 PM, Jason Garrett-Glaser wrote:
> On Mon, Jul 5, 2010 at 1:30 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>> Hi,
>>
>> On Mon, Jul 5, 2010 at 1:44 AM, David Conrad <lessen42 at gmail.com> wrote:
>>> Updated to patch cleanly, compile, and added mmx/sse2 versions
>> [..]
>>> +SECTION_RODATA
>>> +pw_4: times 8 dw 4
>>> +pw_5: times 8 dw 5
>>
>> cextern pw_4, pw_5 (i.e. use the ones in dsputil_mmx.c) maybe?
>>
>>> +; low, high (src), zero
>>> +%macro UNPACK2 4
>>> + mova m%2, m%3
>>> + punpckh%1 m%3, m%4
>>> + punpckl%1 m%2, m%4
>>> +%endmacro
>>
>> duplicate of SBUTTERFLY in x86util.asm, maybe?
>>
>>> +%macro STORE_4_WORDS_MMX 6
>>> + movd %6, %5
>>> +%if mmsize==16
>>> + psrldq %5, 4
>>> +%else
>>> + psrlq %5, 32
>>> +%endif
>>> + mov %1, %6w
>>> + shr %6, 16
>>> + mov %2, %6w
>>> + movd %6, %5
>>> + mov %3, %6w
>>> + shr %6, 16
>>> + mov %4, %6w
>>> +%endmacro
>>
>> For VP8 H loopfilter, I save the neighbouring two rows (p1/q1) and
>> write the four out as dwords using movd at once from the mm register,
>> have you tried that (I'm not asking you to rewrite it if you didn't),
>> and if so, is it faster?
>>
>> (I suppose this isn't very practical because of the SSE4 version below...)
>>
>>> +%macro STORE_4_WORDS_SSE4 6
>>> + pextrw %1, %5, %6+0
>>> + pextrw %2, %5, %6+1
>>> + pextrw %3, %5, %6+2
>>> + pextrw %4, %5, %6+3
>>> +%endmacro
>> [..]
>
> I don't recall pextrw being SSE4...
The form with a memory destination is
More information about the ffmpeg-devel
mailing list