[MPlayer-dev-eng] patch for quadbuffered stereo using "-vo gl2:stereo"

Thu Nov 29 19:27:47 CET 2007

Hello,
On Thu, Nov 29, 2007 at 11:35:09AM -0600, Stuart Levy wrote:
[...]
> One possibility: it could be that the textures are loaded in
> non-sequential order, such that cache size matters.

Well, a quick oprofile still shows over 30% of time spent in libc's
memcpy which for some reason nVidia still insists on using, despite the
fact that either it is simply horrible (no MMX/SSE at all) or just
completely unsuitable for memory->AGP transfers (for which it seems to
be used at least in some cases).
That said, I check the SVN log, and the reason why slice-height = 0 is
now default is that the previous default of 4 was reportedly a lot
slower on some systems.
It is also slower when using direct rendering (PBOs) within MPlayer. In
your test it is not, no idea why (but at least the cache patterns are
likely to be very different in reality than in your small test).
I'd assume that these effects (partial uploads to the same texture and
using many textures being faster) are due to the same thing, and they do
exist with PCIe, but the optimum transfer size seems to differ between
AGP and PCIe.
To summarize: when not using -dr, setting slice-height=8 for vo_gl,
defining TEXTURE_HEIGHT and TEXTURE_WIDTH to 128 and just defining
TEXTURE_HEIGHT to 8 all seemed to have similar effects to me.
Defining only TEXTURE_WIDTH to 8 seem to have less of an effect, but I
did not benchmark precisely enough to know for sure.
Tested with the HD serenity trailer from Apple's HD showcase and
-lavdopts skiploopfilter=all:threads=2 -nosound -benchmark

Greetings,
Reimar Döffinger