[MPlayer-G2-dev] vo3

Fri Jan 2 04:51:28 CET 2004

On Fri, Jan 02, 2004 at 01:54:34AM +0200, Ivan Kalvachev wrote:
> >> > In my design, this makes no sense. The final scale filter for resizing
> >> > would not pass any frames to the vo until time to display them.
> >> final scale filter?!!!
> >> How many scale filters do you have?
> 
> > Normally only one. But suppose you did something like the following:
> >
> > -vf scale=640:480,pullup
> >
> > with output to vo_x11. The idea is that since you'll be resizing DVD
> > to square pixels anyway, you might as well do it before the inverse
> > telecine and save some cycles.
> 
> Just for the protocol;) I'm sure you know that scaling interlaced frames
> vertically will mangle interlacing.

No, that filter chain only does _horizontal_ scaling (telecine is in
NTSC land, remember? :). Anyway scaling interlaced frames is perfectly
valid if you scale the fields independently (which G2 should do
automatically), but it's a bad idea before inverse telecine as it will
decreace accuracy.

> > Now the user resizes the window... In yours (and Arpi's old) very bad
> > way, the scale filter gets reconfigured, ruining the fields. If you
> > don't like this example (there are other ways to handle interlacing)
> > then just consider something like denoising with scaling. My principle
> > in response is that the player should NEVER alter configuration for
> > any filters inserted manually by the user. Instead, it should create
> > its own scale filter for dynamic window resizing and final vo
> > colorspace conversion.
> >
> 
> I'm sorry to say it but you cannot see the forest because there are too
> many trees. (that's saying/proverb)
> 
> I give it as example. I like to give example that are not totally
> fictitious.
> 
> And as you may have noticed the scale filter is the final one. So having 2
> scale one after another is not very good.
> 
> And as Diego says it could be used for resizing while paused, and stuff
> like that.
> 
> About scale. Yep, I agree that it is good to have one scale filter at the
> end. If I am not wrong, now in G1 the scale filter is inserted constantly
> at one and same position, this way preventing automatic conversion except
> in the ordinary case.
> 
> The other big problem with scale is that it makes way too many things. It
> makes format conversion, it scales, it makes format conversion and scale at
> the same time. Of course this is done to maximize speed.
> 
> 
> >> >> safe seeking, auto-insertion of filters.
> >> >
> >> > What is safe-seeking?
> >> When seeking filters that have stored frames should flush them
> >> For example now both mpeg2 decoders don't do that, causing garbage in
> >> B-Frames decoding after seek. Same apply for any temporal filter.
> >> In G1 there is control(SEEK,...), but usually it is not used.
> >
> > OK, understood perfectly.
> >
> >> > Auto-insertion is of course covered.
> >> I'm not criticizing your system. This comment is not for me.
> >> Or I hear irony?
> >
> > And I wasn't criticizing yours, here. I was just saying it's not a
> > problem for either system.
> 
> How about new level of filter insertion - runtime insertion without
> dropping frames?
> 
> >
> >> >> In short the ideas used are :
> >> >>  common buffer and separate mpi  already exist in g1 in some form 
> >> >> counting buffer usage by mpi and freeing after not used  huh, sound
> >> >> like java :O
> >> >
> >> > No. Reference counting is good. GC is idiotic. And you should never
> >> >free
> >> > buffers anyway until close, just repool them.
> >> I didn't say that. Free buffer is buffer that is not busy and could be
> >> reused.
> >> Moreover I don't like the way frames are locked in your code. It doesn't
> >> seem obvious.
> >
> > In VP, you will _only_ lock frames if you need to keep them after
> > passing them on to the next filter. Normally you shouldn't be doing
> > this.
> >
> 
> At first I could understand what GC is, after i send my reply i realized
> that it comes from Garbage Collector :O I hate GC, I don't like java
> because of GC. As you may have missed (and misleaded by my java appointment)
> is that buffers have counters. It is simple - on MPI allocation buffer
> counter is increased , on MPI release it is decreased. When usage is 0 then
> the buffer may be reused.
> You see – no locking needed.

Lock and reference/usage count are exactly the same thing. Just
different words for it.

> On another hand buffer reusing is of cource connected to
> skipped_blocks optimization.
> But I still have no idea how to do it :(

> 
> 
> >> The goal in my design is all the work of freeing/quering to
> >> be moved to vf/vp functions. So in my design you will see only
> >> get_image/release_image, but never lock/unlock. Because having buffer
> >> mean that you need it. (well most of the time)
> >
> > Keep in mind there's no unlock function. It's just get/lock/release.
> >
> > Perhaps you should spend some time thinking about the needs of various
> > filters and codecs. The reason I have a lock function is that the
> > _normal_ case is passing your image on to the next filter without
> > keeping it. Images could start out with 2 locks (1 for source, 1 for
> > dest) and then the source would have to explicitly release it when
> > finished, but IMHO this just adds complexity since most filters should
> > never think about a frame again once sending it out.
> 
> Yes this is what I do.
> The only differense is that simple filter may release frame after
> it pass it to the next one.
> 
> >
> >> >>  allocating all mpi&buffer before starting drawing (look obvious,
> >> >> doesn't it?)  in G1 filters had to copy frames in its own buffers or
> >> >> play hazard by using buffers out of their scope
> >> >
> >> > Yes, maybe G1 was broken. All the codecs/filters I know allocate
> >> > mpi/buffers before drawing, though.
> >> In G1 if you draw_slice out-of-order it is possible to go to a filter
> >> that
> >> haven't yet allocated buffer for this frame - some frames are allocated
> >> on
> >> put_frame.
> >
> > This is because G1 is horribly broken. Slices should not be expected
> > to work at all in G1 except direct VD->VO.
> 
> Ouch. But your current implementation is based on the same principle.)

No, not at all. With my implementation slices are always attached to
an mpi, so you know (when that mpi gets to you) that it goes with the
slices you processed previously. In G1, there's no way to tell, except
to assume that all slices between start_slices and put_image go with
the same frame.

> >> That's also the reason to have one common process()!
> >
> > I disagree.
> >
> >> >>  using flag IN_ORDER, to indicate that these frames are "drawn" and
> >> >> there won't come frames with "earlier" PTS.
> >> >
> >> > I find this really ugly.
> >> It's the only sane way to do it, if you really do out-of-order
> >> processing.
> >
> > No, the draw_slice/commit_slice recursion with frames getting pulled
> > in order works just fine. And it's much more intuitive.
> >
> >> >>  using common function for processing frame and slices  to make
> >> >> slice
> >> >> support more easier
> >> >
> >> > This can easily be done at the filter implementation level, if
> >> > possible. In many cases, it's not. Processing the image _contents_ and
> >> > the _frame_ are two distinct tasks.
> >> Not so easy. Very few filters in G1 support slices, mainly because it
> >> is
> >> separate chain.
> >
> > No, mainly because the api is _incorrect_ and cannot work. Slices in
> > G1 will inevitably sig11.
> >
> 
> Haven't you noticed that in G1 slice-ing is turned on by default?
> ;)

Yes... And I've noticed sig11's with various combinations of crop,
expand, and scale... :(

> > Actually it doesn't. The YV12->NV12 converter can just allow direct
> > rendering, with passthru to the VO's Y plane and its own U/V planes.
> > Then, on draw_slice, the converter does nothing with Y and packs U/V
> > into place in the VO's DR buffer. This is all perfectly valid and no
> > effort to implement in my design.
> >
> > The difficult case is when you want to export some planes and DR
> > others...
> >
> 
> Just one tiny-mini problem. What will be locked if a filter needs the frame
> for later processing?

If which filter does?

If the source (rendering the YV12 into the converter) needs to keep
the frame, it just keeps the DIRECT mpi it obtained from the converter
locked, which in turn keeps the vo's NV12 buffer locked. The release
will later happen recursively, as expected.

If the destionation (getting the NV12 image) needs to keep it (dunno
why since only a vo would want NV12), then the converter isn't
involved at all. The NV12 image is in it's own buffer, with which it
can do whatever it wants.

In summary: my system of buffer lock management _inherently_ handles
all cases correctly, with no difficulty or hacks. So there's no need
to ask questions like this.

> One possible solution is to have separate buffer for Y,U,V planes, also
> QuantTable, SkippedBlocksTbl. But then we will have to manage with multiple
> buffers into one mpi.

No, the mpi's priv area just needs to keep track of its 'dependencies'
and the bufdesc_t struct has to have the right pointers stored in it.

> > Huh? Rounding? WTF? You can't render half a pixel. If a filter is
> > doing slices+resizing (e.g. scale, subpel translate, etc.) it has to
> > deal with the hideous boundary conditions itself...
> >
> 
> Yep, but you forget that x,y,w,h may have any values. In the ordinary way
> they will be multiple of 16, but after one resize filter, this is no
> longer true. And how are you gonna manage the things if you got odd
> x,y,w,h in YV12 image :O

They're invalid, as always. :)

> Now I believe that line counters from top to bottom is the most sane
> alternative.

I'm still not convinced...but as you can see from some of my previous
posts on this list about slices, I don't like arbitrary unrestricted
slices too much either...

> If the only VP3 filter draws top from bottom, then we won't process slices
> for it. Only full images.

This is stupid, even slower than G1.

> But I doubt that VfW codecs may do it too. Anyway they don't have slices.

Right, no slices there.

> >> I agree that there may be some problems for vo with one buffer.
> >> So far you have one (good) point.
> >
> > I never said anything about vo with one buffer. IMO it sucks so much
> > it shouldn't even be supported, but then Arpi would get mad.
> >
> 
> Same thing could be extend for decoder with n=2 static buffers at once, and
> vo with only 2 buffers. Same for n=n+1;
> 
> Well after some brainstorming. I take my words back. That what decoder does
> is to delay one frame by buffering it. As my whole video filter system acts
> like a codec, then we should take control of the buffer delay. This could be
> done by adding “global” variable LowDelay, that contains the number of
> frames we need to wait before starting displaying. In MPEG-1/2/4 case it
> will be 1 or 0, for h264 it may be a little bit higher ;).

This is nonsense. The delay is variable for inverse telecine because
sometimes you need 3 fields to get a frame, and sometimes just 2... :)

> It have nothing to do with buffering ahead. It is just low_delay ripped from
> codec and put into video system.

I think you mean "delay". "low_delay" is a flag that means delay==0.

> >> >> Skipping/Rebuilding
> >> >
> >> > This entire section should be trashed. It's very bad design.
> >> did i said somewhere - not finished?
> >
> > Yes. IMO it just shouldn't exist, though. It's unnecessary complexity
> > and part requires sacrificing performance.
> >
> 
> Not really. When rebuild request appears, a filter may ignore it and
> skip the frame instead of processing it.
> I will try to add new example.
> 
> Let say that I liked your idea for aspect processing. There is only one SAR
> (sample aspect ratio). We started decoding and we have displayed some
> frames. But the user decide that (s)he doesn't like the resolution and
> switch it (like vo_sdl have key for switching resolution). Now, the DAR is
> changed (e.g. From 4:3 to 16:9). This mean that the SAR of the image should
> be changed. In the usual case all images and buffers should be flushed.
> Including some of the buffers in temporal filters. In other words we have to
> “seek” or to start building video chain again (e.g. vd_ffmpeg::init_vo).

No. If the vo suppors hardware scaling, no change is necessary. If
not, then the final scale filter needs to be reconfigured for the new
output aspect (or loaded if none exists already). None of the rest of
the chain needs to be touched.

You should understand that many temporal filters _cannot_ handle
reconfiguration without creating annoying discontinuities in the
output. So the system should seek to accommodate them

> This what I do, is ability for the filter to decide if he can and want to
> start with the new parameters or simply to skip frames emulating “seek”
> flush.
> 
> As you may have guessed the aspect is transferred by get_image and stored in
> MPI.
> 
> There is another side effect – at the same time there may be images with
> different aspects.

This is very bad. One of my key design points of vp is that you never
tell get_image _any_ image properties, and instead configure the link
for the type of images you'll be passing.

> >> how many scaling filters are you planing to have? don't you know that
> >> scale filter is slow?
> >
> > Yes, it's slow. vo_x11 sucks. My point is that the player should
> > _never_ automatically do stuff that gives incorrect output.
> >
> 
> Yep. That's why this rebuild/skipped/invalid mubo-jumbo is about. The
> filters should know better if they can recreate frame or better to skip it.
> 
> The problem is that I do out-of-order rendering, that in main case mean that
> I have some frames that I have processed, but they ar. When
> something changes, I need to rebuild these frames. If I can't I should not
> display them - skip.

I wonder if there's not a simpler way to do it, when it's possible...

> >> > This is an insurmountible problem. The buffers will very likely no
> >> > longer exist. Forcing them to be kept will destroy performance.
> >> You mean will consume a lot of memory?
> >> huh?
> >
> > No. You might have to _copy_ them, which kills performance. Think of
> > export-type buffers, which are NOT just for obsolete codecs! Or
> > reusable/static-type buffers!
> >
> Well I don't think that we have to copy them. If they are no longer
> available then we cannot do anything else than skip the frame instead of
> rebuilding it.

This is better.

> >> > Then don't send it to public mailing lists... :)
> >> The author is never limited by the license, I own full copyright of this
> >> document and I may set any rules on it.
> >
> > Yes but you already published it in a public place. :)
> >
> I can do it. I'm the author. I own the copyright
> You can not do it. You are not the author.
> Well I guess that this may prevent you from quoting me in the maillist.
> But I said that it is for mplayer developers eyes only, so as long as there
>  are no users in this list you may quote me ;)
> Will write a better license next time:)

And better learn that MPlayer devs don't care about licenses and
copyrights... :)

> > Another thing is the rebuild idea. Even though I don't see any way it
> > can be done correctly with your proposal, it would be nice to be able
> > to regenerate the current frame. Think of a screenshot function, for
> > example.
> >
> 
> There is an easier way to make a screen shot ;)
> you just need a split filter that uses DR for one of the chains.
> This filter will do no drawing as it always do DR.
> When A user wants a screen shot, the filter should copy the current frame
> and pass it on the second chain that ends with vo_png or something like it.
> Even scale filter may be auto-inserted into the second chain ;)

This isn't possible without an added burden on buffers. The filter
that does the splitting for screenshots has to keep the currently
displayed image locked so it can send it to vo_png or whatever. In
particular, this means PRESERVE and READABLE must be set!!! (If
preserve isn't set, OSD or something might overwrite the picture!)
Also, if the vo only has a small limited number of buffers, you waste
one!

Screenshots are MUCH more complicated than you think!

The other way it to take screenshot of _next_ frame, but users might
not like that... :)

> > And here are some things I really don't like about your approach:
> > - It's push-based rather than pull-based. Thus:
> > - No good way to handle (intentional) frame dropping. This will be a
> > problem _whenever_ you do out-of-order processing, so it happens
> > with my design too. But the only time I do OOO is for slices...
> 
> Actually there is a way and it works the very same way

Please explain. See below..

> > Now that my faith in slices has been shaken, I think it would be
> > really beneficial to see some _benchmarks_. Particularly, a comparison
> > of slice-rendering through scale for colorspace conversion (and
> > possibly also scaling) versus non-slice. If slices don't help (or
> > hurt) for such complex processing (due to cache thrashing from the
> > scale process itself), then I would be inclined to throw away slices
> > for everything except copying to write-only DR buffers while
> > decoding... On the other hand, if they help, then they're at least
> > good for in-order rendering (non-B codecs).
> There is nice cache examination tool in valgrind debugger.
> If I remember right last time I run it i got about 30% cache hits.
> Anyway in general case they give 5-10% speed up. And this by using them
> only for B-Frames (in-order)
> I think that you can imagine the speed-up;)))

Well do you have any idea for how to reconcile slices with frame
dropping??

Here's the problem:

A filter wants to drop some frames, either to compensate for slow cpu,
or to generate fixed-fps output, or to do smart seeking to non-key
frames, etc. So it somehow needs to signal the earlier part of the
chain when it intends to throw away the next image it gets. This way,
we can skip processing entirely, as long as there are no temporal
filters that need to see all frames.

However, if we run the filter chain out of order (slices), then
the frame currently bring processed is _not_ the next frame
_temporally_ which we want to drop, but instead some future frame,
which we don't yet know whether we want to drop!

How do you solve it??

> > And Ivan, remember: a (pseudocode?) port of vf_pullup to your layer
> > might be very useful in convincing me of its merits or demerits. Also
> > feel free to criticize the way pullup.c does all its own internal
> > buffer management -- perhaps you'd prefer it obtain buffers from the
> > next filter. :) This can be arranged if it will help.
> Will Do it later. (in next mail:)
> I'm first gonna add few simple example filters ;)

OK, thanks!

> As a whole I my current design have only sense in using slices.

Yes...

> The whole concept is to process the data after it have been decoded and
> while it is still in the cache. For this you need 2 things – to process the
> data in the order it is decoded (out-of-order is in fact decode order), and
> to process it in small peaces that fit into cache, but not too small to
> avoid function call overhead.

There's also the question of whether all the _code_ for running the
whole chain with slices will thrash the cache. In G1, everything that
uses slices is trivial code, so it doesn't matter, but if you're doing
complicated processing the game could change...

Rich

P.S. Could you avoid using mswindows charset in emails?? (–, etc.)