[MPlayer-G2-dev] more on g2 & video filters

Tue Sep 30 17:16:05 CEST 2003

A few additional things that came up while talking to Ivan on IRC:

* I forgot about slices.
* I should include examples of buffer alloc/release for IPB codecs.

About slices...actually I think there are two different types of
slices:

1) Simple slices -- source gets a dummy buffer structure with no
   actual pointers in it, and sends the picture to dest one slice at a
   time via draw_slice, passing a pointer to the block to copy.

2) Hybrid slices -- source has an actual buffer (indirect or export,
   or perhaps even direct) which it obtained with a slices flag set,
   and it calls draw_slice with rectangles within this buffer, as they
   are ready.

So now I need to explain why having both is beneficial...

Type (1), simple slices, correspond to the way libmpeg2 slice-renders
B frames, reusing the same small buffer for each slice to ease stress
on the cache. It could also be used for other slice rendering, but
type (2), hybrid slices, has a big advantage! Suppose you have the
following filter chain:

    vd ----> expand ----> vo

and suppose the vo's buffers are in video memory, so the (IP-type)
codec can't direct render through expand. Also suppose the user wants
expand to render OSD.

Now, vd draws with slices to improve performance. The expand filter
could either pass them on to the vo, or get a direct (dr) buffer from
the vo and draw the slices right into it. And here's where the
performance benefit comes. Let's say expand does direct rendering to
vo. Expand's pull_image has called vd's pull_image, which responds
with a sequence of draw_slice calls and then returns the buffer
structure. If we're using hybrid slices, this returned buffer actually
has valid pointers to the decoded picture in it, so expand can use
them as the source for alpha-blending osd onto the dr buffer in video
memory. No reads from video memory are needed!

Actually for the OSD/expand example here, it should be possible to do
the alpha-blending during the actual draw_slice calls, as long as OSD
contents are already known at the time. But there could be other
situations where it would be useful to do some computations at
slice-rendering time (certain localized computations that don't modify
the image -- maybe edge or combing detection) while the data is still
in the cache, and then use the results later once the whole frame is
available for more large-scale or global filtering.

A couple proposed rules for slices:

1. No two slices for the same frame may overlap.
2. With hybrid-type slices, source may NOT modify any region of the
   buffer which has already been submitted via draw_slice.

I'm still a bit undecided on rule #2; it may be better to make this
behavior dependent upon the "reusable" buffer flag.

Updated buffer types list (indirect renamed):

Direct -- owned by dest, allows direct rendering
Indirect -- owned by dest, but no pointers (slices required)
Export -- owned by source
Auto[matic] -- allocated/owned by vp link layer

API issues:

I'm a bit at a loss as for how to make the hybrid slices API clean and
usable (so that a filter/vd can detect availability of different
methods and select the optimal one), but plain simple slices is, well,
simple. You get the indirect buffer with vp_get_buffer, then call
vp_draw_slice to draw into/through it, and eventually return the
indirect buffer (not necessarily in order; out-of-order rendering is
possible just like with DR) to the caller.

The problem with hybrid slices is that some filters may only accept
hybrid slices (if they need to write into nonreadable memory but also
need to be able to read the source image again later -- see my OSD
example above) while some filters and decoders (e.g. libmpeg2
rendering a B frame) will prefer simple slices, and might only support
simple slices. The situation gets more complicated if slices propagate
through several filters.

A thought for Ivan: Slices and XVMC.

>From what I understand, the current XVMC code uses slices to pass the
motion vectors & dct coefficients to the vo, so that a function in the
vo will get called in coded-frame-order. But since slices are used
rather than direct rendering, this wastes an extra copy (data has to
be copied from mplayer's codec's slice buffer to shared-mem/X buffer).

If we find a good way to do hybrid slices with direct-type buffers,
then the codec could DR into shared mem to begin with, and the
draw_slice calls would just notify the vo that the data is ready. If
someone's using XVMC, they probably have a system that's barely fast
enough for DVD playback, so eliminating a copy could make the
difference that allows full-framerate DVD.

"Enough of slices..."
or "Examples with IPB codecs" (also for Ivan)

Let's say we have a codec with IPB frames, rendering to VO via direct
rendering. Coded frame order is IPB and display order is IBP (for the
first three frames). The first time codec's pull_image is called, the
decoder...

1. Gets direct buffer from vo via vp_get_buffer.
2. Decodes first I frame into the buffer.
3. Adds a reference count to the buffer with vp_lock_buffer so it can
   be kept for predicting the next frame, and stores pointer in
   private data area.
4. Returns the buffer.

Simple enough. Now the next call. The decoder...

1. Gets direct buffer from vo via vp_get_buffer.
2. Looks up the pointer for the previous I frame from its private data
   area.
3. Decodes the P frame into the new buffer.
4. Also stores the pointer to the new buffer in private area.
5. Gets another direct buffer from vo.
6. Renders B frame into new buffer based on the I and P buffers.
7. Returns the pointer to the B frame buffer without locking it.

We've now decoded 3 frames and output 2. On the third call, the
decoder does the following:

1. Sees that it's time to output the P frame, so the old I frame is no
   longer useful for prediction.
2. Releases the old I buffer. As far as the codec is concerned now,
   that buffer no longer exists.
3. Locks the P buffer so it won't be lost when the vo releases it.
4. Returns the P buffer.

The same procedure works in principle for slices, except the decoder
must keep both indirect buffers (from the vo, for the purpose of
returning in order to show them) and automatic buffers (from the link
layer, for the purpose of prediction). As the slices API is not yet
finalized, it may be preferred to merge these buffer pairs into one.

Rich