[FFmpeg-devel] [PATCH] make building swscale rgb template conditional

Ramiro Polla ramiro.polla
Tue Sep 14 04:43:34 CEST 2010

On Sun, Sep 5, 2010 at 12:07 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Sun, Sep 05, 2010 at 03:22:06AM -0300, Ramiro Polla wrote:
>> none of those made mmx2 or sse2 faster than mmx. I also tried with
>> many widths/heights. What scenario should I test to see a benefit in
>> using sse2 here? (btw this was all done using a core2duo)
> some things to keep in mind. The input array should be initialized explicitly
> due to copy on write OS behavior for what malloc() returns
> the output array should be bigger than the L2 cache size
> make sure linesize and width are multiples of 16 and pointers are aligned
> also check if the prefetch or the movnt cause the problem by commenting the
> prefetch out

prefetch made no difference, so it's movntq. I tried with several
sizes, and mmx was faster on 512x512, they were almost on par on
1024x1024, and then sse2 started being faster. Would it make sense to
have sws_getContext() get the L2 cache size and determine which
function to use based on whether the image fits in it?

>> Going back to my original patch (0003), it did not change
>> functionality (since sse2 didn't work on ffmpeg anyways). Is it ok to
>> apply it before working on the other issues?
> i primarely care about things being fixed and no app using swscale being
> broken ...

New patch attached.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dont_misuse_have_xxx.diff
Type: application/octet-stream
Size: 17152 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100913/74029704/attachment.obj>

More information about the ffmpeg-devel mailing list