[FFmpeg-devel] [PATCH] make building swscale rgb template conditional
Tue Sep 14 04:43:34 CEST 2010
On Sun, Sep 5, 2010 at 12:07 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Sun, Sep 05, 2010 at 03:22:06AM -0300, Ramiro Polla wrote:
>> none of those made mmx2 or sse2 faster than mmx. I also tried with
>> many widths/heights. What scenario should I test to see a benefit in
>> using sse2 here? (btw this was all done using a core2duo)
> some things to keep in mind. The input array should be initialized explicitly
> due to copy on write OS behavior for what malloc() returns
> the output array should be bigger than the L2 cache size
> make sure linesize and width are multiples of 16 and pointers are aligned
> also check if the prefetch or the movnt cause the problem by commenting the
> prefetch out
prefetch made no difference, so it's movntq. I tried with several
sizes, and mmx was faster on 512x512, they were almost on par on
1024x1024, and then sse2 started being faster. Would it make sense to
have sws_getContext() get the L2 cache size and determine which
function to use based on whether the image fits in it?
>> Going back to my original patch (0003), it did not change
>> functionality (since sse2 didn't work on ffmpeg anyways). Is it ok to
>> apply it before working on the other issues?
> i primarely care about things being fixed and no app using swscale being
> broken ...
New patch attached.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 17152 bytes
Desc: not available
More information about the ffmpeg-devel