[MPlayer-dev-eng] Nut and B frames (and much more) NODAEMON

D Richard Felker III dalias at aerifal.cx
Mon Apr 26 19:40:13 CEST 2004


On Mon, Apr 26, 2004 at 06:51:05PM +0200, Roberto Ragusa wrote:
> On Mon, 26 Apr 2004 07:38:08 -0400
> D Richard Felker III <dalias at aerifal.cx> wrote:
> 
> 
> > But how do B frames (or out-of-order frames in general) fit into this
> > picture?
> 
> Sorry for jumping into this discussion.
> If I've not misunderstood the topics, "nut" is a container format, going to
> compete with avi, ogg/ogm, matroska.

No need to apologize, it's a good question/topic.

> I know that elegant handling of B frames is in general complicated because
> decode order and display order are different, so several strategies can
> be chosen.
> 
> I'd like to raise a possible issue with future advanced video codecs.
> Is the nut container able (in a not hacked and ugly way, I mean) to handle
> codecs that don't operate on single frames?
> We're assuming that the codecs belong to one of these types.
> 1) independent frames (that is IIIIIIII-type, i.e. MJPEG)
> 2) backward dependent frames (that is IPPPIPPP-type, i.e. MPEG without B)
> 3) backward and forward dependent frames (that is IBBBPBBBPBBBI, i.e full MPEG)
> 
> what about another possibility?
> 
> 4) group of frames (that is GGG, where every G encodes, for example, 25 frames).
> 
> Correct me if I'm wrong, but avi was designed with 1) and 2) in mind and
> some trickery was necessary to have streams of type 3).
> Nut will support well 1), 2) and 3) (the existance of this thread is a proof
> of that), but was 4) ever considered until now?
> 
> Just to explain it better, the G type encoder takes 25 frames and
> outputs some bits which describe all the frames. If you miss one of the bits
> none of the frames is recoverable (in theory).

Yes it's considered. But really you need some way of knowing the
presentation times for all the frames. My thought is that such a codec
should have one large packet for the first of the 25 pictures
containing all the data, then small "presentation packets" for each of
the remaining 24 that just serves as a placeholder to tell when the
picture should be shown. When these presentation packets get sent to
the decoder, it outputs the next frame in the sequence.

(Note that I'm using the terms picture and packet instead of frame
because frame in nut terminology means a single coded unit while in
video it means a single picture.)

The other approach would be to just have the one packet for all 25,
with the first PTS, and then have the codec layer generate the rest
with appropriate timestamps. But personally I'm against codecs ever
touching timestamps since most handle time very stupidly.

Anyway I understand your thought that in some ways, video in the time
domain is similar to audio. But it's a lot different too. Audio is
oversampled (44100 or even 48000 or 96000 Hz when you only need about
38000), but video is grossly undersampled (24-30 Hz when you need
about 500 Hz). So doing frequency transforms on video in the time
domain doesn't really make a lot of sense until we have equipment that
can work at 500 fps (or at least 150 fps or so...).

> It would be good to debate on this issue. For example, the pts
> is supposed to represent the istant in which the frame starts
> or the istant at the middle of the frame duration (in my example,
> the G would shift 0.5 seconds if things aren't well defined).

This was already answered. In nut, pts MUST ALWAYS represent the start
time of the frame. Otherwise it's useless for most purposes.

> Another thing, what if the codec does this:
> 1) takes frames 0-24 and generates a G bitstream
> 2) takes frames 20-44 and generates another G bitstream
> and the decoder is supposed to blend frames 20,21,22,23,24 together
> to avoid "block artifacts" in the time domain?
> In other words, the group of frames are partially overlapped.
> (this already happens in audio coding, right?)

This should work fine with the sort of things I described above.

> I don't know if there is some G based video codec, but I'm thinking
> about creating one using modified 3D wavelets.
> BTW, is Tarkin dead?

Probably... :)

Rich




More information about the MPlayer-dev-eng mailing list