[FFmpeg-devel] [PATCH] VP8 luma(16) inner-MB H/V loopfilter MMX/SSE2

Michael Niedermayer michaelni
Sun Jul 11 18:59:02 CEST 2010


On Sun, Jul 11, 2010 at 04:52:04PM +0000, Loren Merritt wrote:
> On Sun, 11 Jul 2010, Ronald S. Bultje wrote:
>
>> You'll notice that the sse2 is significantly slower here, my rough
>> guess is that this is because of my shitty CPU which pretty much
>> emulates xmm-ops through mmx-ops, so it doesn't add a lot of benefit
>> other than not having to setup the loop for doing the second 8 pixels,
>> combined with the added complexity of a 8x16 transpose before the
>> actual filter. I'm betting that on an actual sse2-supporting CPU
>> (Jason?), this would still be faster, but we might want to put this
>> under a FF_MM_SSE2_NOT_SHITTY flag or something along those lines. If
>> you think my code is shitty, comments are welcome also. ;-).
>
> Rather than special-casing most of the functions, we at x264 declared that 
> Core1 doesn't have sse2, and changed the cpuid parser accordingly.
> If you want to support the few cases where sse2 is slightly faster than 
> mmx, I recommend picking a different flag for that and applying it only 
> when you've tested on Core1, so that FF_MM_SSE2 can be trusted to dwim in 
> the usual case.
>
> --Loren Merritt

>  cpuid.c |   14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 7ba0916766645e2de9330e9ba8f30d815da14c91  cpuid.diff

do we have any float SSE2 code that this could affect negatively?
if not iam ok with this patch


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100711/c89aaf7a/attachment.pgp>



More information about the ffmpeg-devel mailing list