[MPlayer-G2-dev] slices

Fri Feb 13 00:08:17 CET 2004

On Thu, Feb 12, 2004 at 10:10:56PM +0100, Arpi wrote:
> Hi,
> 
> Slices. The thing we like and hate most.
> I think the real issue is that we don't even know what slices are!
> It's trivial that we can't live in g2 without slices, but it's not
> easy to live with them.

Thanks for the email!

> Let's see what kind of slices we had, have or will have.
> 
> 1. libmpeg2 B-frame style (also supported by libjpeg & libpng):
> 
> decoder decodes a part of the image (a slice, or a macroblock) to a
> small (smaller than image size) internal (allocated by decoder) buffer,
> then calls next vf to process it (copy it out to image/screen) and
> then re-uses teh same buffer memory area for the next chunk of data.

The current vp spec (and mostly-functional) code implements this by
requesting an INDIRECT type buffer from the next filter (or vo). It
then uses draw_slice to draw into it.

It's also allowed for the next filter to return a DIRECT type buffer
when INDIRECT is requested. Then the vp layer uses a default
draw_slice implementation that just copies into the DIRECT buffer.

Does this sound ok?

> 2. notification style (libavcodec, libmpeg2 I/P frames):
> 
> decoder decodes to an image-sized buffer (it may be allocated by decoder
> or it may be direct rendering) and after decoding a small area (MB or slice)
> it notifies (via callback) the next vf about finishing that part.
> so the next vf can process that part immediatelly from cache.

I see one potential problem with this. If enough filters actually
support per-slice _processing_, then the whole filterchain will get
called for each slice. If it's long (or if the slice is small, e.g. a
single macroblock), this might thrash the instruction cache (or branch
prediction, etc.)and make it slower, rather than faster...?

Also, in principle a filter should try to _output_ an image in slices,
even if the input came all as one image.

> actually 2. is 2.a and 2.b, each for internal and DR buffer.
> so we end up with 3 different kind of slices.

Actually three: internal buffer, DR buffer, and vp-allocated buffer.
But from the receiving filter's end, internal and vp-allocated (EXPORT
and AUTO) are the same, so it reduced to two like you say.

> for using 1. style slices, we need to know (before the decoding) if the
> next vf supports this kind of slices.

For this, it just needs to request an INDIRECT buffer (perhaps with
special flags for which style slices it wants to do...?).

> type-2. slices are simpler, it's optional to call next vf's notification
> callback. it can have such callbacks to do some optimizations, or don't
> have for normal whole-buffer process.

IMO it can't be this flexible. If you're going to report slice
completion, you should have to call the slice callback for every
single part of the image. Otherwise the dest filter has to compute
which region the slice callback never got called for, and process this
at the end, which is painful...

> to implement it in g2, we need 2 different things:
> - a negotation system for type-1. slices support
>   (also negotiate slice type, ie. line(s) of pixels, line(s) of macroblocks,
>    single macroblock, other)

How complicated should the negotiation system be?

> - an optional function to vf API for slice-finished notifications (type 2.)

Agree. This also requires some way of registering with the destination
filter that you're going to be passing the image with slices, so that
it can in turn allocate an output buffer to put its filtered output
into (possibly also with slices).

BTW, one really nasty issue you didn't bring up, which also comes into
this, is the whole buffer-age thing... When using type-2 slices, you
have _two_ buffer ages: the age of the buffer you have direct access
to, and the age of the buffer that the destination filter is rendering
its slice output into. Think of vd->scale(colorspace_only)->vo. It
would be very nice if the decoder could skip rendering unchanged
macroblocks in its yuv output buffers, and also if the scale filter
could skip converting unchanged macroblocks into framebuffer. But
which blocks are "unchanged" for each target may be different!! :/

IMO the only viable solution is some sort of powerful buffer-age
management at the vp layer level...

Rich