[FFmpeg-devel] [PATCH] make building swscale rgb template conditional

Michael Niedermayer michaelni
Sun Sep 5 17:07:16 CEST 2010


On Sun, Sep 05, 2010 at 03:22:06AM -0300, Ramiro Polla wrote:
> On Thu, Aug 26, 2010 at 4:35 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Thu, Aug 26, 2010 at 06:24:37AM +0200, Luca Barbato wrote:
> >> On 08/25/2010 04:22 PM, Michael Niedermayer wrote:
> >> > On Wed, Aug 25, 2010 at 04:14:21PM +0200, Luca Barbato wrote:
> >> >> On 08/25/2010 10:18 AM, Michael Niedermayer wrote:
> [...]
> >> >>> SSE2 is always disabled after this patch
> >> >>> i dont think it was always disabled before, because the code likely was
> >> >>> tested when it was written
> >> >>
> >> >> Apparently not and that was the question I was waiting for an answer...
> >> >
> >> > it did not work (in _mplayer_) when it was commited?
> >>
> >> From a quick look, mplayer has HAVE_SSE2, ffmpeg has HAVE_SSE, so
> >> in mplayer interleaveBytes gets SSE2 instead of mmx no matter what, on
> >> ffmpeg the opposite...
> >
> > hmm so it was working in mplayer (without runtime detect) but not ffmpeg ...
> > that should be fixed
> 
> No cpu acceleration has worked for rgb2rgb in FFmpeg for quite some
> time. Since rgb2rgb uses global function pointers, and
> sws_getContext() initializes them under "if (!rgb15to16)
> sws_rgb2rgb_init(flags);", they only get initialized once, which in
> ffmpeg.c is in:
> sws_opts = sws_getContext(16,16,0, 16,16,0, sws_flags, NULL,NULL,NULL);
> where sws_flags is SWS_BICUBIC and specifies no cpu caps.
> 
> Using global function pointers is obviously a bad idea. I'd suggest we
> drop exposing rgb2rgb but I faintly remember Michael being against
> that because some programs already use that interface and we'd have to
> bump major if we want to remove it, and also because it's useful to
> have a simpler alternative to get rgb2rgb functions than going through
> the swscale interface. Also I'd like to point out that on shared
> builds these global function pointers no longer work since only
> swscale_*, sws_*, and ff_* stuff is exported.
> 
> If we must keep the rgb2rgb interface (not the global pointers, these
> should definitely go away at next major bump), I suggest we either:

> 1- create a simpler function like sws_getRGBContext that only takes
> arguments relative to rgb2rgb (like src and dst pixel format), and
> that can be used with sws_scale()

i see no sense in this ...


> 2- create an rgb2rgb context which the user can initialize and then
> use rgb->rgb24tobgr32() and such.

maybe i dont know


> 
> Michael, how do you suggest we proceed regarding the rgb2rgb interface?

yes, what do the users of rgb2rgb think?
do they want to keep an alternative way to access it or can the main sws
api be used?
also we should consider improving the main sws api if someone has an idea
in that direction though its quite simple API already...



> 
> And another point, even if those optimizations did work, sse2
> currently code only occurs in interleaveBytes, and on that function
> both mmx2 and sse2 are *slower* than the mmx code. I asked Jason why
> that could happen and he said it's because of movntq, and I should be
> careful when benchmarking. I tried many kinds of tests:
> - converting the same input to the same output many times and not
> using the output after that
> - converting the same input to the same output many times and running
> crc over output
> - converting different input to same output many times and running crc
> over output
> - converting different input to different output many times and not
> using the output after that
> - converting different input to different output many times and
> running crc over output
> none of those made mmx2 or sse2 faster than mmx. I also tried with
> many widths/heights. What scenario should I test to see a benefit in
> using sse2 here? (btw this was all done using a core2duo)

some things to keep in mind. The input array should be initialized explicitly
due to copy on write OS behavior for what malloc() returns
the output array should be bigger than the L2 cache size
make sure linesize and width are multiples of 16 and pointers are aligned
also check if the prefetch or the movnt cause the problem by commenting the
prefetch out






> 
> Going back to my original patch (0003), it did not change
> functionality (since sse2 didn't work on ffmpeg anyways). Is it ok to
> apply it before working on the other issues?

i primarely care about things being fixed and no app using swscale being
broken ...

[....]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Old school: Use the lowest level language in which you can solve the problem
            conveniently.
New school: Use the highest level language in which the latest supercomputer
            can solve the problem without the user falling asleep waiting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100905/50871e58/attachment.pgp>



More information about the ffmpeg-devel mailing list