[MPlayer-dev-eng] Nut and B frames (and much more) NODAEMON

Mon Apr 26 18:51:05 CEST 2004

On Mon, 26 Apr 2004 07:38:08 -0400
D Richard Felker III <dalias at aerifal.cx> wrote:

> But how do B frames (or out-of-order frames in general) fit into this
> picture?

Sorry for jumping into this discussion.
If I've not misunderstood the topics, "nut" is a container format, going to
compete with avi, ogg/ogm, matroska.

I know that elegant handling of B frames is in general complicated because
decode order and display order are different, so several strategies can
be chosen.

I'd like to raise a possible issue with future advanced video codecs.
Is the nut container able (in a not hacked and ugly way, I mean) to handle
codecs that don't operate on single frames?
We're assuming that the codecs belong to one of these types.
1) independent frames (that is IIIIIIII-type, i.e. MJPEG)
2) backward dependent frames (that is IPPPIPPP-type, i.e. MPEG without B)
3) backward and forward dependent frames (that is IBBBPBBBPBBBI, i.e full MPEG)

what about another possibility?

4) group of frames (that is GGG, where every G encodes, for example, 25 frames).

Correct me if I'm wrong, but avi was designed with 1) and 2) in mind and
some trickery was necessary to have streams of type 3).
Nut will support well 1), 2) and 3) (the existance of this thread is a proof
of that), but was 4) ever considered until now?

Just to explain it better, the G type encoder takes 25 frames and
outputs some bits which describe all the frames. If you miss one of the bits
none of the frames is recoverable (in theory).

Maybe this could look bizarre to someone, but my point is that
a frame-based approach for video is just like a sample-based approach
for audio. Older audio encoders work sample by sample (maybe with some
prediction, i.e. ADPCM), newer ones work on groups of samples
(exploiting a transformed domain, i.e. MP3).
Same thing for video; with I, P and B, we're at the ADPCM stage.
We will all use codecs doing transforms in the time domain, sooner
or later, so it's better to be prepared.
Audio has samples and groups of them are called "audio frames".
Video has frames and groups of them could be called "sequences", I
don't know if a name has been already made for them (gop is not
exactly that).
Video coding has moved from pixels to frames (so it's
possible to transform along the x and y axis), the next step is
obvious (transform along t).

A transformation along t has to be well designed to give
good compression (but some good solution can be found), and much
more memory and CPU is needed (but in a few years it will not be a
problem anymore, and PCI-express and fully programmable GPUs
(a.k.a. powerful parallel processors) will certainly help).

It would be good to debate on this issue. For example, the pts
is supposed to represent the istant in which the frame starts
or the istant at the middle of the frame duration (in my example,
the G would shift 0.5 seconds if things aren't well defined).
Another thing, what if the codec does this:
1) takes frames 0-24 and generates a G bitstream
2) takes frames 20-44 and generates another G bitstream
and the decoder is supposed to blend frames 20,21,22,23,24 together
to avoid "block artifacts" in the time domain?
In other words, the group of frames are partially overlapped.
(this already happens in audio coding, right?)

I don't know if there is some G based video codec, but I'm thinking
about creating one using modified 3D wavelets.
BTW, is Tarkin dead?

-- 
   Roberto Ragusa    mail at robertoragusa.it