D Richard Felker III
dalias at aerifal.cx
Sun Dec 28 06:18:30 CET 2003
> User-Agent: SquirrelMail/1.4.1
Ivan, if you expect a reply, please get a mailer that understands how
to wrap lines at 80 columns and preserve formatting when quoting. Your
broken SquirrelMail converted all the quotes into multi-hundred-column
On Sun, Dec 28, 2003 at 04:51:23AM +0200, Ivan Kalvachev wrote:
> D Richard Felker III said:
> > On Sat, Dec 27, 2003 at 01:01:22AM +0200, Ivan Kalvachev wrote:
> >> Hi, here is some of my ideas,
> >> i'm afraid that there already too late to be implemented, as
> >> dalias is coding him pipeline system, while i have not finished the drafts
> >> already, but feel free to send coments...
> > Time to pour on the kerosine and light the flames... :)
> Your are very bad in making good points, you always fall in flame. I
> noticed that you like flames, even if you burn in them sometimes.
This line was a joke. Maybe if you'd taken it as such you would have
been more thoughtful in your responses to what followed.
> > First comment: putting everything in a PDF file makes it very
> > difficult to quote and reply to individual parts of your draft.
> > I'll try my best... [all following quotes are from the PDF]
> Yep, The original is in OOo format. And You told me that you don't want to
> install "that bloat"
Well yeah, PDF was better than OOo.
> >> Here are few features that I'm trying to achieve:
> >> decreasing memcpy by using one and same buffer by all filters
> >> that can do it (already done in g1 as Direct Rendering method 1)
> >> support of partial rendering (slices and DR m2)
> > These are obviously necessary for any system that's not going to be
> > completely unusably sucky. And they're already covered in G1 and G2 VP.
> flame. I know that, you know that, everybody know that.
Huh? Who am I flaming? I don't follow.
> >> ability to quickly reconfigure and if possible - to reuse data that
> >> is already processed (e.g. we have scale and the user resizes the
> >> image, - only images after scale will be redone),
> > In my design, this makes no sense. The final scale filter for resizing
> > would not pass any frames to the vo until time to display them.
> final scale filter?!!!
> How many scale filters do you have?
Normally only one. But suppose you did something like the following:
with output to vo_x11. The idea is that since you'll be resizing DVD
to square pixels anyway, you might as well do it before the inverse
telecine and save some cycles.
Now the user resizes the window... In yours (and Arpi's old) very bad
way, the scale filter gets reconfigured, ruining the fields. If you
don't like this example (there are other ways to handle interlacing)
then just consider something like denoising with scaling. My principle
in response is that the player should NEVER alter configuration for
any filters inserted manually by the user. Instead, it should create
its own scale filter for dynamic window resizing and final vo
> >> safe seeking, auto-insertion of filters.
> > What is safe-seeking?
> When seeking filters that have stored frames should flush them
> For example now both mpeg2 decoders don't do that, causing garbage in
> B-Frames decoding after seek. Same apply for any temporal filter.
> In G1 there is control(SEEK,...), but usually it is not used.
OK, understood perfectly.
> > Auto-insertion is of course covered.
> I'm not criticizing your system. This comment is not for me.
> Or I hear irony?
And I wasn't criticizing yours, here. I was just saying it's not a
problem for either system.
> >> In short the ideas used are :
> >> common buffer and separate mpi already exist in g1 in some form
> >> counting buffer usage by mpi and freeing after not used huh, sound
> >> like java :O
> > No. Reference counting is good. GC is idiotic. And you should never free
> > buffers anyway until close, just repool them.
> I didn't say that. Free buffer is buffer that is not busy and could be
> Moreover I don't like the way frames are locked in your code. It doesn't
> seem obvious.
In VP, you will _only_ lock frames if you need to keep them after
passing them on to the next filter. Normally you shouldn't be doing
> The goal in my design is all the work of freeing/quering to
> be moved to vf/vp functions. So in my design you will see only
> get_image/release_image, but never lock/unlock. Becouse having buffer meen
> that you need it. (well most of the time)
Keep in mind there's no unlock function. It's just get/lock/release.
Perhaps you should spend some time thinking about the needs of various
filters and codecs. The reason I have a lock function is that the
_normal_ case is passing your image on to the next filter without
keeping it. Images could start out with 2 locks (1 for source, 1 for
dest) and then the source would have to explicitly release it when
finished, but IMHO this just adds complexity since most filters should
never think about a frame again once sending it out.
> >> allocating all mpi&buffer before starting drawing (look obvious,
> >> doesn't it?) in G1 filters had to copy frames in its own buffers or
> >> play hazard by using buffers out of their scope
> > Yes, maybe G1 was broken. All the codecs/filters I know allocate
> > mpi/buffers before drawing, though.
> In G1 if you draw_slice out-of-order it is possible to go to a filter that
> haven't yet allocated buffer for this frame - some frames are allocated on
This is because G1 is horribly broken. Slices should not be expected
to work at all in G1 except direct VD->VO.
> That's also the reason to have one common process()!
> >> using flag IN_ORDER, to indicate that these frames are "drawn" and
> >> there won't come frames with "earlier" PTS.
> > I find this really ugly.
> It's the only sane way to do it, if you really do out-of-order processing.
No, the draw_slice/commit_slice recursion with frames getting pulled
in order works just fine. And it's much more intuitive.
> >> using common function for processing frame and slices to make slice
> >> support more easier
> > This can easily be done at the filter implementation level, if
> > possible. In many cases, it's not. Processing the image _contents_ and
> > the _frame_ are two distinct tasks.
> Not so easy. Very few filters in G1 support slices, mainly becouse it is
> separate chain.
No, mainly because the api is _incorrect_ and cannot work. Slices in
G1 will inevitably sig11.
> > One thing omitted in G2 so far is allowing for mixed buffer types, where
> > different planes are allocated by different parties. For
> > example, exporting U and V planes unchanged and direct rendering a new Y
> > plane. I'm not sure if it's worth supporting this, since it would be
> > excessively complicated. However, it would greatly speed up certain
> > filters such as equalizer.
> Yes I was thinking about such hacks. But definitly they are not worth
> implementig. Matrox YUV mode need such hack, but it could be done in vo
Actually it doesn't. The YV12->NV12 converter can just allow direct
rendering, with passthru to the VO's Y plane and its own U/V planes.
Then, on draw_slice, the converter does nothing with Y and packs U/V
into place in the VO's DR buffer. This is all perfectly valid and no
effort to implement in my design.
The difficult case is when you want to export some planes and DR
> >> Dalias already pointed that processing may not be strictly top from
> >> bottom, may not be line, slice, or blocks based. This question is still
> >> open for discussion. Anyway the most flexible x,y,w,h way proved to be
> >> also the most hardier and totally painful. Just take a look of crop or
> >> expand filters in G1. More over the current G1
> >> scheme have some major flaws:
> >> the drawn rectangles may overlap (it depends only on decoder)
> > No, my spec says that draw_slice/commit_slice must be called exactly
> > once for each pixel. If your codec is broken and does not honor this,
> > you must wrap it or else not use slices.
> The problem may arrase in filter slices too! Imagine rounding errors;)
Huh? Rounding? WTF? You can't render half a pixel. If a filter is
doing slices+resizing (e.g. scale, subpel translate, etc.) it has to
deal with the hideous boundary conditions itself...
> >> add new flag that I call IN_ORDER. This flag indicates that all frames
> before this one are already available in the
> >> in-coming/out-coming area. Lets make an example with MPEG IPB order. We
> have deciding order IPB and display IBP.
> >> First we have I frame. We decode it first and we output it to the
> filters. This frame is in order so the flag should be set for it (while
> processing). Then we have P-Frame. We decode it, but we do not set the
> flag (yet). We process the P-Frame too. Then we decode an B-Frame that
> depends on the previous I and P Frames. This B-Frame is in order when
> we process it. After we finish with the B-Frame(s) the first P-Frame is
> in order.
> > This idea is totally broken, as explained by Michael on ffmpeg-devel. It
> makes it impossible for anything except an insanely fast computer to
> play files with B frames!! Here's the problem:
> > 1. You decode first I frame, IN_ORDER.
> > 2. You display the I frame.
> > 3. You decode the P frame. Not IN_ORDER.
> > 4. You decode the B frame. IN_ORDER.
> > 5. You display the B frame, but only after wasting >frametime seconds,
> > thus causing A/V desync!!
> > 6. The P frame becomes IN_ORDER.
> > 7. You display the P frame.
> > 8. Process repeats.
> > The only solution is to always impose one-frame delay at the _decoder_
> end when decoding files with B frames. In Ivan's design, this can be
> imposed by waiting to set the IN_ORDER flag for an I/P frame until the
> next B frame is decoded.
> Again you say things that i haven't. Actually I (or you ) may have missed
> one of the points. Well I will add it to the goals. Buffering ahead. Now.
> I said that IN_ORDER is replacement for the draw_frame()!!!!
Ahh, that's a _very_ helpful way to think about it. Thanks! I still
don't like it, but at least I don't think your system is total
> This meen that in the above example I-frame won't be IN_ORDER. Your
> problem solved.
Yes, I agree totally. That's solved.
> Anyway the IN_ORDER doesn't force us to display the frame.
> There is no need to start displaying frame in the moment they are
Yes, but it's hard to know when to display unless you're using threads
(or the cool pageflip-from-slice-callback hack :))
> I agree that there may be some problems for vo with one buffer.
> So far you have one (good) point.
I never said anything about vo with one buffer. IMO it sucks so much
it shouldn't even be supported, but then Arpi would get mad.
> >> As you can see it is very easy for the decoders to set the IN_ORDER
> >> flag, it could be done om G1's decode() end, when the frames are in order.
> > Actually, this is totally false. Libavcodec does _not_ export any
> > information which allows the caller to know if the frames are being
> > decoded in order or not. :( Yes, this means lavc is horribly broken...
> avcodec always display frames in order, unless you set manually flags like
> _OUT_OF_ORDER or _LOW_DELAY ;)
No. Keep in mind that your chain will be running from the
draw_horiz_band callback... (in which case, it will be out of order) I
would expect you to set the LOW_DELAY flag under these circumstances,
but maybe you wouldn't.
> >> If an MPI is freed without setting IN_ORDER then we could guess that it
> >> have been skipped.
> > Frame sources cannot be allowed to skip frames. Only the destination
> > requesting frames can skip them.
> If this rule is removed then IN_ORDER don't have any meening. Usually
> filter that makes such frames is broken. If a filter that wants to remove
> dublicated frames may set flag SKIPPED (well if such flag exists;)
> SKIPPED/INVALID is requared becouse there are always 2 mpi's that point to
> one buffer (vf1->out and vf_2->in )
I misunderstood IN_ORDER. SKIPPED makes sense now, it's just not quite
the way I would implement it.
> >> Skipping/Rebuilding
> > This entire section should be trashed. It's very bad design.
> did i said somewhere - not finished?
Yes. IMO it just shouldn't exist, though. It's unnecessary complexity
and part requires sacrificing performance.
> >> Now the skipping issue is rising. I propose 2 flags, that should be
> >> added like IN_ORDER flag, I call them SKIPPED and REBUILD. I thought
> >> about one common INVALID, but it would have different meening
> >> depending from the array it resides (incoming or outgoing)
> >> SKIPPED is requared when a get_image frame is gotten but the
> >> processing is not performed. The first filter sets this flag in the
> >> outgoing mpi, and when next filter process the date, if should free the
> >> mpi (that is now in the incoming). If the filter had allocated another
> >> frame, where the skipped frame should have been draw, then it can free
> >> it by setting it as SKIPPED.
> > Turn things around in the only direction that works, and you don't need
> > an image flag for SKIPPED at all. The filter _requesting_ the image
> > knows if it intends to use the contents or not, so if not, it just
> > ignores what's there. There IS NO CORRECT WAY to frameskip from the
> > source side.
> I'm not talking about skipping of frame to maintain A-V sync.
> And decoders are from the source side, they DO skip frames. And in this
> section I use SKIPPED in menning of INVALID, as you can see from the
I couldn't tell. If the codec skipped a frame at user-request, it
would also be invalid...
> how many scaling filters are you planing to have? don't you know that
> scale filter is slow?
Yes, it's slow. vo_x11 sucks. My point is that the player should
_never_ automatically do stuff that gives incorrect output.
> > Bad point 2: your "rebuild" idea is not possible. Suppose the scale
> > filter has stored its output in video memory, and its input has
> > already been freed/overwritten. If you don't allow for this,
> > performance will suck.
> If you had read carefully you would see that I had pointed that problem
> too (with solution I don't like very much). That's the main reason this
> section is not compleated.
> >> [...]
> >> -vf spp=5,scale=512:384,osd
> >> [...]
> >> Now the user turns off OSD that have been already rendered into a
> frame. Then vf_osd set REBUILD for all affected frames in the
> >> incoming array. The scale filter will draw the frame again, but it
> won't call spp again. And this gives a big win because vf_spp could be
> extremly slow.
> > This is stupid. We have a much better design for osd: as it
> > slice-renders its output, it makes backups (in very efficient form) of
> the data that's destroyed by overwriting/alphablending. It can then undo
> the process at any time, without ever reading from its old input buffers
> or output buffers. In fact, it can handle slices of any shape and size,
> OSD is only EXAMPLE. not the real case.
> Well then I had gave bad example. In fact REBUILD is nessesery then filter
> uses a buffer that is requested by the previous filter. Also if vo
> invalidate the buffer by some reason, this is the only way it could signal
> the rest of the filters.
Invalidating buffers is a problem...
> Yeh, these issues are raised by he way i handle mpi/buffer, but I have not
> seen any such system so far. Usually in such situation all filters will
> get something like reset and will start from next frame. Of cource this
> could be a lot of pain in out-of-order scheme!
It's not really too bad. Although ideally it should be possible to
make small changes to the filter chain _without_ any discontinuity in
the output video...
> > This is an insurmountible problem. The buffers will very likely no
> > longer exist. Forcing them to be kept will destroy performance.
> You meen will consume a lot of memory?
No. You might have to _copy_ them, which kills performance. Think of
export-type buffers, which are NOT just for obsolete codecs! Or
> >> 1. Interlacing should the second field have its own PTS?
> > In principle, definitely yes. IMO the easiest way to handle it is to
> require codecs that output interlaced video to set the duration field,
> and then pts of the second field is just pts+duration/2.
> Why? Just becouse you like it that way?
Yes. Any other way is fine too. Unfortunately it's impossible to
detect whether the source video is interlaced or not (stupid flags are
always wrong), so some other methods such as always treating fields
independently are troublesome...
> > Then don't send it to public mailing lists... :)
> The author is never limited by the license, I own full copyright of this
> document and I may set any rules on it.
Yes but you already published it in a public place. :)
> > So, despite all the flames, I think there _are_ a few realy good ideas
> > here, at least as far as deficiencies in G1 (or even G2 VP) which we
> > need to resolve. But I don't like Ivan's push-based out-of-order
> > rendering pipeline at all. It's highly non-intuitive, and maybe even
> > restrictive.
> Huh, I'm happy to hear that there are good ideas. You didn't point
> anything good. I see only critics&flames.
Sorry, I wasn't at all clear. The best ideas from my standpoint were
the ones that highlighted deficiencies in my design, e.g. the
buffers-from-multiple-sources thing. Even though I flame them, I also
sort of line your slice ideas, but basically every way of doing slices
Another thing is the rebuild idea. Even though I don't see any way it
can be done correctly with your proposal, it would be nice to be able
to regenerate the current frame. Think of a screenshot function, for
> > Actually, the name (VO3) reflects what I don't like about it: Ivan's
> > design is an api for the codec to _output_ slices, thus calling it video
> > output. (In fact, all filter execution is initiated from within the
> > codec's slice callback!)
> This is one of the possible ways. In the vo2 drafts I wanted to implement
> something called automatic sliceing- forcing filters to use slices even
> when decoder doesn't support slicing. (I can nearly imagine the flames you
> are thinking in the moment;)
I understand what you're saying. I'm just strongly opposed to the main
entry point being at the codec end. In particular, it does not allow
cpu-saving frame dropping. Only in a pull-based system where you wait
to decode/process a frame until the next filter wants it can you skip
(expensive!) processing (or even decoding, for B frames!) based on
whether the output is destined for the encoder/monitor or the
Ultimately, slice _processing_ isn't very friendly to this goal. The
more we discuss it, the more I'm doubting that slice processing is
even useful. On the one hand it's very nice, for optimizing cache
usage, but on the other, it forces you to process frames before you
even want them. This is a _big_ obstacle to framedropping, and to
smooth playback, since displaying certain frames might require no
processing, and displaying others might require processing 2 or 3
frames first... :((
> Anyway my API makes all filter codecses. That's why the scheme looks so
> complicated, and that's why simple filter is so nessesery. The full beauty
> of the API will be seen only for people that make temoral filters and
> adding/removing frames. This mean by you :O
Perhaps you could port vf_pullup to pseudocode using your api and see
if you could convince me?
> > On the other hand, I'm looking for an API
> > for _obtaining_ frames to show on a display, which might come from
> > anywhere -- not just a codec. For instance they might even be
> > generated by visualization plugins from audio data, or even from
> Oh, Could you explain why mine API cannot be used for these things?
It's _called_ from the codec's draw_slice! Not very good at all for
multiple video sources, e.g. music + music video + overlaid
> > So, Ivan. I'll try to take the best parts of what you've proposed and
> > incorporate them into the code for G2. Maybe we'll be able to find
> > something we're both happy with.
> Wrong, We need something that we both are equally unhappy with:)))
> But as far as you code it is natural you to implement your ideas.
So, now let me make some general remarks (yes, this is long
After this email, I understand your proposal a lot better. The big
difference between our approaches is that I treat buffers (including
"indirect" buffers) as objects which filters obtain and hold onto
internally and which they only "pass along" when it's time to display
them, while you treat buffers as entities which are carefully managed
in a queue between each pair of filters, which can be processed
immediately, and which are only "activated" (IN_ORDER flag) when it's
actually their time.
Here are some things I like better about your approach:
- It's very easy to cancel buffers when unloading/resetting filters.
- Buffer management can't be 'hidden' inside the filters, meaning that
we're less likely to have leaks/crashes from buggy filters.
- Processing can be done in decoding order even when slices aren't
supported (dunno whether this actually happens).
- Slices are fairly restricted, easing implementation.
And here are some things I really don't like about your approach:
- It's push-based rather than pull-based. Thus:
- No good way to handle (intentional) frame dropping. This will be a
problem _whenever_ you do out-of-order processing, so it happens
with my design too. But the only time I do OOO is for slices...
- Slices are fairly restricted, limiting their usefulness.
- Having the chain run from the decoder's callback sucks. :(
- It doesn't allow "dumb slices" (small reused buffer).
- It doesn't have a way to handle buffer age/skipped blocks (I know,
my design doesn't solve this either...)~:
- My YV12->NV12 conversion might not be possible with your buffer
Now that my faith in slices has been shaken, I think it would be
really beneficial to see some _benchmarks_. Particularly, a comparison
of slice-rendering through scale for colorspace conversion (and
possibly also scaling) versus non-slice. If slices don't help (or
hurt) for such complex processing (due to cache thrashing from the
scale process itself), then I would be inclined to throw away slices
for everything except copying to write-only DR buffers while
decoding... On the other hand, if they help, then they're at least
good for in-order rendering (non-B codecs).
And Ivan, remember: a (pseudocode?) port of vf_pullup to your layer
might be very useful in convincing me of its merits or demerits. Also
feel free to criticize the way pullup.c does all its own internal
buffer management -- perhaps you'd prefer it obtain buffers from the
next filter. :) This can be arranged if it will help.
As much as I dislike some of your ideas, I'm open to changing things
to be more like what you propose. I want G2 to be the best possible
tool for video! And that matters more than ego/flames/eliteness/etc.
Maybe you'll even get me to write your (modified) design for you, if
you come up with convincing proposals... :))
More information about the MPlayer-G2-dev