[MPlayer-dev-eng] vo_gl PBO patch ..

Reimar Döffinger Reimar.Doeffinger at stud.uni-karlsruhe.de
Sat May 3 15:10:17 CEST 2008


On Wed, Apr 30, 2008 at 03:48:40PM -0600, Sven Gothel wrote:
> On Wednesday 30 April 2008 14:44:40 Reimar Döffinger wrote:
> > On Wed, Apr 30, 2008 at 02:05:57PM -0600, Sven Gothel wrote:
> > > My patch just adds a little PBO helper tool,
> > > to ease the PBO usage, utilizing a bit OO style,
> > > so we don't have to bother with all the floating around variables.
> > 
> > Reducing the number of global variables is a good goal, but doing this
> > in a patch that fixes a performance issue makes it really hard to
> > understand where the performance issue is, what causes it, and what is
> > the simplest way to fix it.
> 
> It depends. Proper management for multiple PBO's indicates a little toolkit.

Though your suggestion makes PBOs dependent on textures. It is already
problematic for get_image (when the U and V planes may be required to
follow directly after the Y plane) and it does work badly or possibly
not at all for e.g. vertexes, and the same when one texture needs
multiple PBOs (as for MPEG codecs where the additional PBOs would be
needed to store the reference frames for direct rendering - that is
assuming OpenGL can provide PBO buffers that can be read fast enough).
Which is why I would tend more towards a malloc-like interface to PBOs
if I write a framework.

> > How do you define DMA? I learned DMA as data transfer between (usually)
> > main memory and a peripheral device under the control of the device,
> > thus leaving the CPU free.
> > To me, the memcpy used in this case seems to fail all these criteria...
> 
> DMA can be any memory transfer not handled by the CPU in PIO mode.
> On a system, there are many DMA controller available.
> 
> The GPU one usually is able to handle:
> 	GPUMem <-> SystemMem

Ok.

> 	GPUMem <-> GPUMem

Where did you get this information from? I don't see much sense and
quite a few technical problems for a GPU memory to GPU memory DMA engine
(unless you count the GPU itself as a DMA engine).

> maybe even (AMD GPU's can do this for sure):
> 	SystemMem <-> SystemMem

Huh? Why would they implement that, doing memory to memory copies over
the PCIe bus? That would be really slow...

> The system memory controller under some architectures is able to do the same,
> at least SystemMem <-> SystemMem.

I'd really like to see some proof of that. Unless you consider what the
Cell does as system memory to system memory DMA, though usually it is
described as system memory to local cache transfer (and the other way
round).

> This maybe utilized by memcpy, to my knowledge it is on ia/x86 platforms.

No, certainly not. You can check the glibc sources. Or just disassemble
your system's libc.

> Even though memcpy would be performed in PIO mode,
> using PBO's for glTexSubImage ensures at least this one is using DMA.

For all I know, there actually is not guarantee that PBO transfer
operations will use DMA. They usually will, but e.g. if you create so
many PBOs that they no longer fit in the DMA area they will use normal
copy. Though it might be that the problem of fixed-size DMA areas
has been finally solved, I haven't paid attention to that, and there
certainly are enough possible solutions.

> > E.g. the current PBO handling code uses 1 PBO for YV12 mode, your patch
> > changes this to use 3. Does this have any effect on speed?
> No.
> It just ensures, that we setup the PBO only once.
> Of course, instead of the 3 (full-size + 2 half-size),
> you may create and use the fullsize only, since they are used sequentially.
> That would be fine .. IMHO.

I meant one 3/2 size PBO, as it is used by get_image. Either way I too
doubt it makes a difference on speed.

> Regarding the slicing part, I see that you always copy the whole video frame.
> So we may can drop the previous slices/copies and do it only on the whole image,
> ie when y+slice==h ?

Only if it is carefully benchmarked and shown not to be slower on a wide
range of systems.

> > > Well, it seems it is simply a bug in the current ATI driver,
> > > which enforces us to use multiple memcpy, ie. shape the PBO memory
> > > so no NPOT stride happen - even though the driver claims to support NPOT.
> > 
> > What is NPOT supposed to mean? My only interpretation of NPOT would be
> > non-power-of-two 
> right. Actually they use it for non power of two textures,
> but we get the drift, right.

So you mean this is only a problem with rectangle=2 ? Haven't seen that
make a useful difference anyway... If that's all I'd prefer to leave it
broken.

Greetings,
Reimar Döffinger



More information about the MPlayer-dev-eng mailing list