[MPlayer-dev-eng] vo_gl PBO patch ..

Tue Apr 29 20:55:37 CEST 2008

On Tuesday 29 April 2008 12:24:28 Reimar Döffinger wrote:
> On Tue, Apr 29, 2008 at 11:51:20AM -0600, Sven Gothel wrote:
> > fixing a minor possible bug in case bytes-per-pixel!=1
> > in gl_common.c
> > 
> > btw .. would anybody like to review this ?
> > thank you.
> 
> I have very little time right now, but I will look at it (this
> is not a complete review).
> I do quite dislike the code duplication in the draw_slice function
> though, 
it is not necessary, and the distiction can be don within the Upload function,
which can be unified as well .. yeah right, I guess that's better.

> and it is quite unclear which part of the changes make the 
> biggest difference, e.g. if implementing MP_IMGTYPE_IP and
> MP_IMGTYPE_IPB in get_image would be enough.

btw .. the original PBO usage doesn't work, 
ie is against the PBO spec.
get_image:
	- maps the PBO

.. PBO mem content changed by video codec ..

draw_image:
	- upload data to texture, glTexSubImage2D
	- unmap the PBO

but it should be:
	- map memory
	- change data
	- unmap memory
	- data upload
or am I wrong ?
I have tested both, but only the latter works on other than NVidia.

IMHO the original fast PBO path should be removed,
well .. it doesn't hit anyway, since the requirements for get_image -> PBO,
are never met (as much as I have tested it).

> Also the patch changing mp_msg calls at the same time and converting
> existing code to use the gl...PBO wrapper functions (which so far seem
> a bit like overkill to me anyway) make this patch quite hard to
> understand.

I will unify the Upload functionality ..

> In addition, I have some doubts if using more than one PBO is a good
> idea.

Well, it works and it is much faster.
I.e. creating a buffer object is expensive,
switching is cheap.

> 
> > @@ -646,16 +666,30 @@
> >  static int draw_slice(uint8_t *src[], int stride[], int w,int h,int x,int y)
> >  {
> >    mpi_flipped = (stride[0] < 0);
> > -  glUploadTex(gl_target, gl_format, gl_type, src[0], stride[0],
> > -              x, y, w, h, slice_height);
> > -  if (image_format == IMGFMT_YV12) {
> > -    ActiveTexture(GL_TEXTURE1);
> > -    glUploadTex(gl_target, gl_format, gl_type, src[1], stride[1],
> > -                x / 2, y / 2, w / 2, h / 2, slice_height);
> > -    ActiveTexture(GL_TEXTURE2);
> > -    glUploadTex(gl_target, gl_format, gl_type, src[2], stride[2],
> > -                x / 2, y / 2, w / 2, h / 2, slice_height);
> > -    ActiveTexture(GL_TEXTURE0);
> > +  if(!use_pboDMA) {
> > +      glUploadTex(gl_target, gl_format, gl_type, src[0], stride[0],
> > +                  x, y, w, h, slice_height);
> > +      if (image_format == IMGFMT_YV12) {
> > +        ActiveTexture(GL_TEXTURE1);
> > +        glUploadTex(gl_target, gl_format, gl_type, src[1], stride[1],
> > +                    x / 2, y / 2, w / 2, h / 2, slice_height);
> > +        ActiveTexture(GL_TEXTURE2);
> > +        glUploadTex(gl_target, gl_format, gl_type, src[2], stride[2],
> > +                    x / 2, y / 2, w / 2, h / 2, slice_height);
> > +        ActiveTexture(GL_TEXTURE0);
> > +      }
> 
> Reindentation (aka "cosmetics") may not be mixed with functional
> changes.

Yeah .. come on :)
Next time I will squash the white spaces from the review ..
Seriously, the block shifted one right ..

> 
> > @@ -882,6 +949,17 @@
> >                 "Use -vo gl:nomanyfmts if playback fails.\n");
> >      mp_msg (MSGT_VO, MSGL_V, "[gl] Using %d as slice height "
> >               "(0 means image height).\n", slice_height);
> > +
> > +    switch(use_pboDMA) {
> > +        case  1:
> > +            gl_pboDMA = PBO_XFER_MULTIPLE_MEMCPY;
> > +            break;
> > +        case  2:
> > +            gl_pboDMA = PBO_XFER_SINGLE_MEMCPY;
> > +            break;
> > +        default:
> > +            gl_pboDMA = PBO_XFER_DISABLED;
> > +    }
> 
> Decoupling UI and internal code is a good idea, but it also bloats the
> code. I would prefer if the user would just set gl_pboDMA directly.
> If ever a proper separation is desired, it would be better to extend
> the parser to support it directly.

I thought about it, ie casting int -> enum.
Ok .. will do.

> 
> > +  if ( pbo->mode==PBO_XFER_SINGLE_MEMCPY && h * stride > pbo->sz ) {
> > +    pbo->mode=PBO_XFER_MULTIPLE_MEMCPY;
> > +    mp_msg (MSGT_VO, MSGL_INFO, "[gl] glUploadTexPBO %d: PBO single -> multiple memcpy (video stride)!\n", 
> > +        pbo->name);
> 
> If you intend to check if there is a stride applied, "h * stride >
> pbo->sz" is the wrong condition, "stride > w * bytes_per_pixel" is the
> right one.

No, here we check if a single memcpy would not exceed the texture bounds.
For multiple memcpy, yes, you are right, and we do this later on ..
> 
> > +  if ( h * w > pbo->sz )
> > +  {
> > +    mp_msg (MSGT_VO, MSGL_INFO, "[gl] glUploadTexPBO %d: %d/%d %dx%d, sz %d, slic %d, strd %d, bpp %d -> PBO Disabled (video size)!\n", 
> > +      pbo->name, x,y, w,h,  pbo->sz, slice, stride, bytesPerPixel);
> > +    glDestroyPBO(pbo);
> > +
> > +    // fallback ..
> > +    glUploadTex(target, format, type, dataptr, stride, x, y, w, h, slice);
> > +    return;
> > +  }
> 
> Either this is possible, then it should be supported properly (not just
> via fallback) or it should just not do anything, according to the motto:
> better fail hard than continue with an unexpected performance
> degradation - users are unlikely to notice the message, failing will
> ensure we will get a bug report instead of a user just thinking "oh
> MPlayer is so slow..." and never reporting it.

hmm .. as you wish, so let's fail hard :)

> 
> > +  if((pbo->test&1)==0) {
> > +      mp_msg (MSGT_VO, MSGL_V, "[gl] glUploadTexPBO %d: %d/%d %dx%d, sz %d, slic %d, strd %d, bpp %d - mode %d\n", 
> > +        pbo->name, x,y, w,h,  pbo->sz, slice, stride, bytesPerPixel, pbo->mode);
> > +      pbo->test|=1;
> > +  }
> 
> IMO that has no place in productive code.
right :)

> 
> > +  if(pbo->mode==PBO_XFER_SINGLE_MEMCPY) {
> > +    memcpy(pbo->mem, data,  h*stride);
> > +    rowlenBytes = stride;
> > +  } else if(pbo->mode==PBO_XFER_MULTIPLE_MEMCPY) {
> > +    for(i=0;i<h;i++)
> > +    {
> > +        memcpy(pbo->mem+(i*pbo->stride), data+(i*stride), w*bytesPerPixel);
> > +    }
> 
> You should probably look at fast_memcpy, mem2agpcpy, memcpy_pic and
> mem2agpcpy_pic

I will look at these, thank you.

However, the default memcpy today will make the right choice,
ie: CPU unroled inline .., CPU function call, DMA xfer ..

Thanks lot for your review, Reimar.

Cheers, Sven
> 
> Greetings,
> Reimar Döffinger
> _______________________________________________
> MPlayer-dev-eng mailing list
> MPlayer-dev-eng at mplayerhq.hu
> https://lists.mplayerhq.hu/mailman/listinfo/mplayer-dev-eng
> 

-- 
health & wealth
mailto:sgothel at jausoft.com ; www  : http://www.jausoft.ca ; pgp: http://www.jausoft.com/gpg/
land : +1 (780) 637 3842 ; cell: +1 (780) 952 4481
Timezone MST: EST-2, UTC-7, CET-8 ; MDT: EDT-2, UTC-6, CEDT-8