[FFmpeg-devel] [PATCH] VP8 luma(16) inner-MB H/V loopfilter MMX/SSE2

Mon Jul 19 18:57:11 CEST 2010

Hi,

On Sun, Jul 18, 2010 at 4:11 PM, Loren Merritt <lorenm at u.washington.edu> wrote:
> On Sun, 18 Jul 2010, Ronald S. Bultje wrote:
>>
>> On Sun, Jul 11, 2010 at 2:47 PM, Loren Merritt <lorenm at u.washington.edu>
>> wrote:
>>>
>>> On Sun, 11 Jul 2010, Michael Niedermayer wrote:
>>>>
>>>> On Sun, Jul 11, 2010 at 04:52:04PM +0000, Loren Merritt wrote:
>>>>>
>>>>> Rather than special-casing most of the functions, we at x264 declared
>>>>> that
>>>>> Core1 doesn't have sse2, and changed the cpuid parser accordingly.
>>>>> If you want to support the few cases where sse2 is slightly faster than
>>>>> mmx, I recommend picking a different flag for that and applying it only
>>>>> when you've tested on Core1, so that FF_MM_SSE2 can be trusted to dwim
>>>>> in
>>>>> the usual case.
>>>>>
>>>>> ?cpuid.c | ? 14 +++++++++++++-
>>>>> ?1 file changed, 13 insertions(+), 1 deletion(-)
>>>>> 7ba0916766645e2de9330e9ba8f30d815da14c91 ?cpuid.diff
>>>>
>>>> do we have any float SSE2 code that this could affect negatively?
>>>> if not iam ok with this patch
>>>
>>> ff_lpc_compute_autocorr_sse2
>>
>> Attached patch implements FF_MM_SSE2/3SLOW for this purpose.
>
> ok if you've tested it (I haven't).

What would I test exactly? I've tested that the function pointers
don't get assigned (so expected behaviour) on my core1, unless I put
it under SSE2SLOW|SSE2. I have tested the improved speed for all VP8
functions under SSE2SLOW|SSE2, but haven't tested the LPC function (I
am taking your word for that one). I also haven't tested any other
functions to see if SSE2SLOW could help.

Michael, are you ok with the new flags FF_MM_SSE2/3SLOW?

Ronald