[MPlayer-G2-dev] A new vf layer proposal...

Fri Sep 12 01:53:01 CEST 2003

Early in G2 development we discussed changes to the vf layer, and some
good improvements were made over G1, but IMO lots of it's still ugly
and unsatisfactory. Here are the main 3 problems:

1) No easy way to expand the filter layer to allow branching rather
   than just linear filter chain. Think of rendering PIP (picture in
   picture, very cool for PVR type use!) or fading between separate
   clips in a video editing program based on G2.

2) The old get_image approach to DR does not make it possible for the
   destination filter/vo to know when the source filter/vd is done
   using a particular reference image. This means DR will not be
   possible with emerging advanced codecs which are capable of using
   multiple reference frames instead of the simple I/P/B model.

3) The whole vf_process_image loop is ugly and makes artificial
   distinction between "pending" images (pulled) and the old G1
   push-model drawing.

Actually (3) has a lot to do with (1).

So the proposal for the new vf layer is that everything be "pull"
model, i.e. the player "pulls" an image from the last filter, which in
turn (recursively) pulls an image from the previous filter (or perhaps
from multiple previous filters!).

Such a design was discussed early on in G2 development, but we ran
into problems with auto-insertion of conversion filters. However I
think the following proposal solves the problem.

vf_get_buffer (2 cases):

1) Next filter can accept our output format. If the next filter
   implements get_buffer, call that. Otherwise get a buffer from a
   pool for this filter-connection, growing the pool if all the
   buffers are already in use.

2) Next filter doesn't like our output format. Insert appropriate
   conversion filter and then do (1).

In all cases, buffers obtained from get_buffer have reference counts.
When a buffer is first obtained, it has reference count=1, meaning
that the destination filter has a hold on it because it wants the
output which the source filter is drawing into the buffer. If the
source filter does not need to use the image as a reference for future
frames, it can just return the image to the caller and the destination
filter will unlock the buffer (thus freeing it for reuse) when it's
finished using the image as input. On the other hand, if the source
filter needs to keep the image as a reference for future frames, it
can add its own lock (vf_lock_buffer) so that the image still has a
nonzero reference count once the destination filter finishes using it.

In addition to the above behavior, flags can be used to signal who
(source or dest) is allowed to read/modify the image, and when. Thus,
we have the equivalencies (old system to new system):

TEMP: source does not lock the buffer

TEMP+READABLE: source does not lock the buffer, but is allowed to read
it (i.e. it can't be in video mem)

IP: source locks buffer and is allowed to read it

STATIC: source locks buffer and is allowed to write to it again after
passing it on to the destination

STATIC+READABLE: source locks buffer and is allowed to read it and
write again after passing it on

These explanations are fairly rough; they're just meant to give an
idea of how things convert over. There's probably a need for a
function similar to get_buffer, but which instead notifies the next
filter that you want to reuse a buffer you already have (from previous
locking) as the output for another frame. But as far as I can tell,
all of this is minor detail that doesn't affect the proposal as a
whole.

Now we get to the more interesting function.....

vf_pull_image:

This one also has a couple cases:

1) The filter calling vf_pull_image was just created (by
   auto-insertion) during the previous filter's pull_image, so we
   do NOT want to call the previous filter's pull_image again. Instead
   we've saved the mpi returned by the previous filter somewhere in
   our vf structure, so we just clear that from the vf structure and
   immediately return it. This is a minor hack to make auto-inserted
   filters work, but it's not visible from outside of the
   vf_pull_image function itself, so IMO it's not ugly.

2) We call the previous filter's pull_image and get an image with
   destination == the calling filter. Return the image to the caller.

3) The previous filter's pull_image returns an image whose destination
   is *not* the calling filter. This means a conversion filter must
   have been inserted during the previous filter's pull_image (as a
   result of it calling get_buffer).

Summary:     (may have 10l bugs :)

if (vf->pending_mpi) {
        mpi = vf->pending_mpi;
	vf->pending_mpi = NULL;
	return mpi;
}
while ((mpi = src_vf->pull_image(vf, src_vf)) && mpi->dest_vf != vf) {
        mpi->dest_vf->pending_mpi = mpi;
        src_vf = mpi->dest_vf;
}
return mpi;

A couple comments about this. The nicest part of the design is that
vf_pull_image doesn't need to know so much about the 'chain' structure
of the filters. It should be called with something like:

    mpi = vf_pull_image(vf, vf->prev);

so that a filter which wants multiple sources could do something like:

    mpi1 = vf_pull_image(vf, vf->priv->src1);
    mpi1 = vf_pull_image(vf, vf->priv->src2);

or whatever. Actually the source should probably be passed to
vf_pull_image as a pointer so that it can be updated when a conversion
filter is auto-inserted.

Also note that my proposal has mpi structure containing pointers to
the dest (and possibly also source) filters with which the buffer is
associated. I'm not sure this is entirely necessary, but it seems like
a good idea.

Of course the best part of all is that, from the calling program's
perspective and the filter authors' perspective, vf_pull_image looks
like a 100% transparent pull-based recursive frame processing system.
No ugly process_image/get_pending_image distinction and push/pull mix,
just a sequential flow of frames.

Comments? I believe there are a few details to be worked out,
especially in what happens when a filter gets auto-inserted by
get_buffer, how buffer pools work, etc., but the basic design is
sound. Concerns about get_buffer (e.g. whether you release a buffer
before or after you return it, and if after, how) have been eliminated
by use of reference counts and there seem to be no major obstacles to
implementing the vf_pull_image system as described.

At some point on the not-too-distant future I'd like to begin porting
filters (especially pullup) to G2 and writing mencoder-g2, so I hope
we can discuss the matter of overhauling the vf layer soon and then
get around to some actual coding.

                                                  Rich

P.S. One more thing: I made no mention of how configuration
(especially output size and all the resize nonsense Arpi was talking
about :) works. I'll be happy to discuss that later, but I'd like to
see what Arpi suggests first since that's all very confusing to me,
and I don't think the design I've described above makes much
difference to it...