[FFmpeg-devel] [PATCH] VP8 luma(16) inner-MB H/V loopfilter MMX/SSE2

Ronald S. Bultje rsbultje
Sun Jul 11 22:24:06 CEST 2010


Hi Eli

On Jul 11, 2010, at 3:52 PM, Eli Friedman <eli.friedman at gmail.com>  
wrote:
> On Sun, Jul 11, 2010 at 8:53 AM, Ronald S. Bultje  
> <rsbultje at gmail.com> wrote:
>> You'll notice that the sse2 is significantly slower here, my rough
>> guess is that this is because of my shitty CPU which pretty much
>> emulates xmm-ops through mmx-ops, so it doesn't add a lot of benefit
>> other than not having to setup the loop for doing the second 8  
>> pixels,
>> combined with the added complexity of a 8x16 transpose before the
>> actual filter. I'm betting that on an actual sse2-supporting CPU
>> (Jason?), this would still be faster, but we might want to put this
>> under a FF_MM_SSE2_NOT_SHITTY flag or something along those lines. If
>> you think my code is shitty, comments are welcome also. ;-).
>
> On my Mobile Core i5, the SSE2 version has the expected performance
> gain vs. the mmxext version (55% of the time for the vertical version,
> 65% of the time for the horizontal version).

Thanks for taking the time to test, this is approximately what I'd  
expect.

Watching the game now, will look at your other comments later, thanks  
for those also.

Ronald



More information about the ffmpeg-devel mailing list