[MPlayer-G2-dev] vo3

Sun Dec 28 03:51:23 CET 2003

D Richard Felker III said:
> On Sat, Dec 27, 2003 at 01:01:22AM +0200, Ivan Kalvachev wrote:
>> Hi, here is some of my ideas,
>> i'm afraid that there already too late to be implemented, as
>> dalias is coding him pipeline system, while i have not finished the drafts
>> already, but feel free to send coments...
>
> Time to pour on the kerosine and light the flames... :)
Your are very bad in making good points, you always fall in flame. I
noticed that you like flames, even if you burn in them sometimes.

>
> First comment: putting everything in a PDF file makes it very
> difficult to quote and reply to individual parts of your draft. I'll try
my best... [all following quotes are from the PDF]
Yep, The original is in OOo format. And You told me that you don't want to
install "that bloat"

>
>> Here are few features that I'm trying to achieve:
>>  decreasing memcpy by using one and same buffer by all filters that
can do it (already done in g1 as Direct Rendering method 1)
>>  support of partial rendering (slices and  DR m2)
>
> These are obviously necessary for any system that's not going to be
completely unusably sucky. And they're already covered in G1 and G2 VP.
flame. I know that, you know that, everybody know that.
>
>>  support   for   get/release   buffer   (ability   to   release
buffers   when   they   are   no   longer needed)
>
> This is not so obvious at first, but absolutely necessary for
> overcoming bugs in G1 that prevented all but the simplest filters from
using buffer sharing/DR. It's also already covered in G2 VP -- in fact
it was one of the two key design points.
yep

>
>>  out of order rendering  ability to move the data through the video
filters if there is no temporal dependency
>>  display_order rendering  this is for filters that need to use
temporal dependences
>
> Ivan and I disagree greatly on the nature of these goals. To me, they're
a simple consequence of a natural way of thinking about frame passing
and slice rendering. To him, out-of-order is the fundamental frame
passing protocol, and special care is required for handling frames in
order.
YES!

>
>>  ability to keep as many incoming images are needed and to output as
many images as filter may need to (e.g. in case of motion blur we will
have e.g. 6 incoming images and 6 outgoing at once)
>>  support for PTS.
>
> These were the primary motivation behind G2 VP.
yes.

>
>>  ability to quickly reconfigure and if possible - to reuse data that
is already processed (e.g. we have scale and the user resizes the
image, - only images after scale will be redone),
>
> In my design, this makes no sense. The final scale filter for resizing
would not pass any frames to the vo until time to display them.
final scale filter?!!!
How many scale filters do you have?

>
>> safe seeking, auto-insertion of filters.
>
> What is safe-seeking?
When seeking filters that have stored frames should flush them
For example now both mpeg2 decoders don't do that, causing garbage in 
B-Frames decoding after seek. Same apply for any temporal filter.
In G1 there is control(SEEK,...), but usually it is not used.

>
> Auto-insertion is of course covered.
I'm not criticizing your system. This comment is not for me.
Or I hear irony?

>
>>  ability to have more complicated graph (than simple chain) for
processing.
>
> This is definitely a desirable goal.
>
>>  simple structure and flexible design.
>
> IMNSHO the out-of-order stuff in Ivan's design is anything but simple.
Take the red pill and you will see the truth :))

>
>
>> In short the ideas used are :
>>  common buffer and separate mpi  already exist in g1 in some form 
counting buffer usage by mpi and freeing after not used  huh, sound
like java :O
>
> No. Reference counting is good. GC is idiotic. And you should never free
buffers anyway until close, just repool them.
I didn't say that. Free buffer is buffer that is not busy and could be
reused.
Moreover I don't like the way frames are locked in your code. It doesn't
seem obvious. The goal in my design is all the work of freeing/quering to
be moved to vf/vp functions. So in my design you will see only
get_image/release_image, but never lock/unlock. Becouse having buffer meen
that you need it. (well most of the time)

>
>>  allocating all mpi&buffer before starting drawing (look obvious,
doesn't it?)  in G1 filters had to copy frames in its own buffers or
play hazard by using buffers out of their scope
>
> Yes, maybe G1 was broken. All the codecs/filters I know allocate
mpi/buffers before drawing, though.
In G1 if you draw_slice out-of-order it is possible to go to a filter that
haven't yet allocated buffer for this frame - some frames are allocated on
put_frame.
That's also the reason to have one common process()!

>
>>  using flag IN_ORDER, to indicate that these frames are "drawn" and
there won't come frames with "earlier" PTS.
>
> I find this really ugly.
It's the only sane way to do it, if you really do out-of-order processing.

>
>>  using common function for processing frame and slices  to make slice
support more easier
>
> This can easily be done at the filter implementation level, if
> possible. In many cases, it's not. Processing the image _contents_ and
the _frame_ are two distinct tasks.
Not so easy. Very few filters in G1 support slices, mainly becouse  it is
separate chain.

>
>>  emulating complicated graph in a simple linked list.
>
> This sounds like an ugly hack.
Hack - yes, ugly - donno??

>
>>  messaging system for dropping/rebuilding MPI's.(not yet finished)
>
> Very bad.
>
>>  having prepared simple type filters (like non temporal  one
>> input/one output, processing the frame as it came, without carre for
buffer management) (not documented)
>
> Also provided for in G2 VP.

>
>> [...]
>> So, the frame is split on 2 parts, one I will call mpi and the other I
will call mp_buffer. The mp_buffer part contains the memory
>> buffer, usage count, buffer common width and height, maybe stride. The
mp_buffer->count is the number of MPI-s that point to that
>> buffer. Probably we may allow buffer to contain more than one piace if
memory (e.g. 3 memory blocks for Y,U,V planes).
>
> An idea like this was already suggested by Arpi and adopted in G2 VP,
but not as extreme. The reason for not making such a sharp division is
that the owner of the buffer will often _need_ to know about the
buffer's status as the contents of a given frame, not just which buffer
it is.
huh?
buffer parameters are constats. MPI parameters are variables.

>
> One thing omitted in G2 so far is allowing for mixed buffer types, where
different planes are allocated by different parties. For
> example, exporting U and V planes unchanged and direct rendering a new Y
plane. I'm not sure if it's worth supporting this, since it would be
excessively complicated. However, it would greatly speed up certain
filters such as equalizer.
Yes I was thinking about such hacks. But definitly they are not worth
implementig. Matrox YUV mode need such hack, but it could be done in vo
level.

>
>> [...]
>> This scheme also allows to get rid of the static buffer type. Simply
the decoder will never release it's mpi, but will pass it to the filter
chain, multiple times (like ffmpeg's reuse). On the other side static
buffers should always be in the main memory, otherwise they can take
the only display buffer and stale displaying (e.g. vo with one buffer,
and decoder with 2 static buffers)
>
> This is the same principle as the REUSABLE flag in G2 VP, except that DR
buffers are also allowed to be reusable in my design.
I don't use flag. Anyway, it is not something major.

>> Dalias already pointed that processing may not be strictly top from
bottom, may not be line, slice, or blocks based. This question is still
open for discussion. Anyway the most flexible x,y,w,h way proved to be
also the most hardier and totally painful. Just take a look of crop or
expand filters in G1. More over the current G1
>> scheme have some major flaws:
>>  the drawn rectangles may overlap (it depends only on decoder)
>
> No, my spec says that draw_slice/commit_slice must be called exactly
once for each pixel. If your codec is broken and does not honor this,
you must wrap it or else not use slices.
The problem may arrase in filter slices too! Imagine rounding errors;)

>
>>  drawing could be done in any order. This makes it very hard to say in
what part of the image is already processed
>
> I agree, it's very ugly. IMO there should at least be certain minimal
restrictions on slice structure, but I don't know what they should be.
In any case, I don't like Ivan's idea of restricting slices to
> macroblock-high horizontal strips drawn in-order from top to bottom...
Certainly broken codecs like VP3 will want to draw bottom-to-top.
I have never said anything about macroblock-high strips.
As for VP3 we may flip it in the beggning and the flip it before
processing ;) yeh, ugly ,) j/k

>
>>  skipped_blocks processing is very hard. Theoretically it is
>> possible to draw only non- skipped blocks, but then the above
>> problem raise.
>
> I would _really_ like a clean solution to skipped_blocks processing.
It's the final key to speed which we haven't solved... :(
>
>> The main problem is the out-of-order rendering. The filters should be
able to process, the frames in the order they came. On another side
there are some filters that can operate only in display order. So what
is the solution?
>> By design the new video system requires PTS (picture time stamp). I
>
> PTS stands for PRESENTATION time stamp, not picture time stamp.
I thought I had fixed that.

>
>> add new flag that I call IN_ORDER. This flag indicates that all frames
before this one are already available in the
>> in-coming/out-coming area. Lets make an example with MPEG IPB order. We
have deciding order IPB and display IBP.
>> First we have I frame. We decode it first and we output it to the
filters. This frame is in order so the flag should be set for it (while
processing). Then we have P-Frame. We decode it, but we do not set the
flag (yet). We process the P-Frame too. Then we decode an B-Frame that
depends on the previous I and P Frames. This B-Frame is in order when
we process it. After we finish with the B-Frame(s) the first P-Frame is
in order.
>
> This idea is totally broken, as explained by Michael on ffmpeg-devel. It
makes it impossible for anything except an insanely fast computer to
play files with B frames!! Here's the problem:
>
> 1. You decode first I frame, IN_ORDER.
> 2. You display the I frame.
> 3. You decode the P frame. Not IN_ORDER.
> 4. You decode the B frame. IN_ORDER.
> 5. You display the B frame, but only after wasting >frametime seconds,
>    thus causing A/V desync!!
> 6. The P frame becomes IN_ORDER.
> 7. You display the P frame.
> 8. Process repeats.
>
> The only solution is to always impose one-frame delay at the _decoder_
end when decoding files with B frames. In Ivan's design, this can be
imposed by waiting to set the IN_ORDER flag for an I/P frame until the
next B frame is decoded.
Again you say things that i haven't. Actually I (or you ) may have missed
one of the points. Well I will add it to the goals. Buffering ahead. Now.
I said that IN_ORDER is replacement for the draw_frame()!!!!
This meen that in the above example I-frame won't be IN_ORDER. Your
problem solved. Anyway the IN_ORDER doesn't force us to display the frame.
There is no need to start displaying frame in the moment they are
compleated.
I agree that there may be some problems for vo with one buffer.
So far you have one (good) point.

>>  As you can see it is very easy for the decoders to set the IN_ORDER
>> flag, it could be done om G1's decode() end, when the frames are in order.
>
> Actually, this is totally false. Libavcodec does _not_ export any
information which allows the caller to know if the frames are being
decoded in order or not. :( Yes, this means lavc is horribly broken...
avcodec always display frames in order, unless you set manually flags like
_OUT_OF_ORDER or _LOW_DELAY ;)

>
>> If an MPI is freed without setting IN_ORDER then we could guess that it
have been skipped.
>
> Frame sources cannot be allowed to skip frames. Only the destination
requesting frames can skip them.
If this rule is removed then IN_ORDER don't have any meening. Usually
filter that makes such frames is broken. If a filter that wants to remove
dublicated frames may set flag SKIPPED (well if such flag exists;)
SKIPPED/INVALID is requared becouse there are always 2 mpi's that point to
one buffer (vf1->out and vf_2->in )

>
>> Skipping/Rebuilding
>
> This entire section should be trashed. It's very bad design.
did i said somewhere - not finished?

>
>> Now the skipping issue is rising. I propose 2 flags, that should be
added like IN_ORDER flag, I call them SKIPPED and REBUILD. I thought
about one common INVALID, but it would have different meening
>> depending from the array it resides (incoming or outgoing)
>> SKIPPED is requared when a get_image frame is gotten but the
>> processing is not performed. The first filter sets this flag in the
outgoing mpi, and when next filter process the date, if should free the
mpi (that is now in the incoming). If the filter had allocated another
frame, where the skipped frame should have been draw, then it can free
it by setting it as SKIPPED.
>
> Turn things around in the only direction that works, and you don't need
an image flag for SKIPPED at all. The filter _requesting_ the image
knows if it intends to use the contents or not, so if not, it just
ignores what's there. There IS NO CORRECT WAY to frameskip from the
source side.
I'm not talking about skipping of frame to maintain A-V sync.
And decoders are from the source side, they DO skip frames. And in this
section I use SKIPPED in menning of INVALID, as you can see from the
quote.

>
>> E.g. if we have this chain
>> -vf crop=720:540,spp=5:4,scale=512:384
>> This chain should give quite a trill to 2GHz processor. Now imagine
that scale is auto inserted and that the vo is some window RGB only
device (vo_x11). If a user change the window size, scale parameters
change too. Scale should rebuild all frames that are processed, but now
shown. Scale filter can safely SKIP all frames in the outgoing.
>
> Bad point 1: manually created filters which have been given parameters
MUST NEVER auto-reconfigure. In my design, if the user enabled dynamic
window rescaling, another scale filter controlled by the UI layer would
get inserted, and activated only when the window size was
> non-default.
I just give example how the filter chain will look WHEN scale is auto
inserted. Read carefuly!! and don't hurry to flame. And don't forget that
the Front-end will have full control of the config() parameters.

how many scaling filters are you planing to have? don't you know that
scale filter is slow?

>
> Bad point 2: your "rebuild" idea is not possible. Suppose the scale
filter has stored its output in video memory, and its input has
> already been freed/overwritten. If you don't allow for this,
> performance will suck.
If you had read carefully you would see that I had pointed that problem
too (with solution I don't like very much). That's the main reason this
section is not compleated.

>
>> [...]
>> -vf spp=5,scale=512:384,osd
>> [...]
>> Now the user turns off OSD that have been already rendered into a
frame. Then vf_osd set REBUILD for all affected frames in the
>> incoming array. The scale filter will draw the frame again, but it
won't call spp again. And this gives a big win because vf_spp could be
extremly slow.
>
> This is stupid. We have a much better design for osd: as it
> slice-renders its output, it makes backups (in very efficient form) of
the data that's destroyed by overwriting/alphablending. It can then undo
the process at any time, without ever reading from its old input buffers
or output buffers. In fact, it can handle slices of any shape and size,
too!
OSD is only EXAMPLE. not the real case.
Well then I had gave bad example. In fact REBUILD is nessesery then filter
uses a buffer that is requested by the previous filter. Also if vo
invalidate the buffer by some reason, this is the only way it could signal
the rest of the filters.
Yeh, these issues are raised by he way i handle mpi/buffer, but I have not
seen any such system so far. Usually in such situation all filters will
get something like reset and will start from next frame. Of cource this
could be a lot of pain in out-of-order scheme!

>
>> On another side, there is one big problem  the mpi could already be
freed by the previous filter. To workaround it we may need to keep all
buffers until the image is shown (something like
>> control(FLIP,pts) for all filters). Same thing may be used on seek, to
flush the buffers.
>
> This is an insurmountible problem. The buffers will very likely no
longer exist. Forcing them to be kept will destroy performance.
You meen will consume a lot of memory?
huh?

>
>> Problems remaining!
>
> Lots more than you itemize!
>
>> 1. Interlacing  should the second field have its own PTS?
>
> In principle, definitely yes. IMO the easiest way to handle it is to
require codecs that output interlaced video to set the duration field,
and then pts of the second field is just pts+duration/2.
Why? Just becouse you like it that way?
Simply give examples.

>
>> P.S.
>>  I absolutely forbid this document to be published anywhere. It is
>> only for mplayer developers' eyes. And please somebody to remove the
very old vo2 drafts, from the g1 CVS.
>
> Then don't send it to public mailing lists... :)
The author is never limited by the license, I own full copyright of this
document and I may set any rules on it.

>
> Sorry but IMO it's impossible to properly respond/comment without
quoting large sections.
>
> So, despite all the flames, I think there _are_ a few realy good ideas
here, at least as far as deficiencies in G1 (or even G2 VP) which we
need to resolve. But I don't like Ivan's push-based out-of-order
rendering pipeline at all. It's highly non-intuitive, and maybe even
restrictive.
Huh, I'm happy to hear that there are good ideas. You didn't point
anything good. I see only critics&flames.

>
> Actually, the name (VO3) reflects what I don't like about it: Ivan's
design is an api for the codec to _output_ slices, thus calling it video
output. (In fact, all filter execution is initiated from within the
codec's slice callback!)
This is one of the possible ways. In the vo2 drafts I wanted to implement
something called automatic sliceing- forcing filters to use slices even
when decoder doesn't support slicing. (I can nearly imagine the flames you
are thinking in the moment;)
Anyway my API makes all filter codecses. That's why the scheme looks so
complicated, and that's why simple filter is so nessesery. The full beauty
of the API will be seen only for people that make temoral filters and
adding/removing frames. This mean by you :O

> On the other hand, I'm looking for an API
> for _obtaining_ frames to show on a display, which might come from
anywhere -- not just a codec. For instance they might even be
> generated by visualization plugins from audio data, or even from
/dev/urandom!
Oh, Could you explain why mine API cannot be used for these things?

> My design makes the source of the video totally
> transparent, rather than making the source the entry point for
> everything! And, my design separates image content processing (which
might be able to happen out-of-order) from frame processing (which
always happens in order).

>
> So, Ivan. I'll try to take the best parts of what you've proposed and
incorporate them into the code for G2. Maybe we'll be able to find
something we're both happy with.
Wrong, We need something that we both are equally unhappy with:)))
But as far as you code it is natural you to implement your ideas.

>
> With kind flames,
> Rich
>

Deep water is dangerous
   Ivan Kalvachev
  iive