[MPlayer-dev-eng] Nut and B frames (and much more) NODAEMON

Tue Apr 27 18:33:07 CEST 2004

On Tue, Apr 27, 2004 at 05:13:31PM +0200, Michael Niedermayer wrote:
> > The other approach would be to just have the one packet for all 25,
> > with the first PTS, and then have the codec layer generate the rest
> > with appropriate timestamps. But personally I'm against codecs ever
> > touching timestamps since most handle time very stupidly.
> even another approach is to store the individual wavelet transformed frames 

We were talking about 3D wavelet transform, so individual frames no
longer exist after the transform...

> with some reordering, and iam almost certain this is what the standard 
> committees will do if they decide to standardize such a thing
> why?
> simple, its much more flexible, just discard the temporal high-frequency 
> subband(s) and u got a video with with 1/2,1/4, ... fps or discard the 
> spatial high freq subband(s) and u got 1/2, ... the resolution

Uhg, these standards committees and their stupid temporal scalability
goals... When will they learn that 1/2 fps looks totally horrible and
they'd be better off just dropping to 1/4 or 1/16 resolution???

> > Anyway I understand your thought that in some ways, video in the time
> > domain is similar to audio. But it's a lot different too. Audio is
> > oversampled (44100 or even 48000 or 96000 Hz when you only need about
> > 38000), but video is grossly undersampled (24-30 Hz when you need
> > about 500 Hz). So doing frequency transforms on video in the time
> > domain doesn't really make a lot of sense until we have equipment that
> > can work at 500 fps (or at least 150 fps or so...).
> thats surely true, unless the temporal transform is done along the motion 
> trajectories

If real motion estimation is even possible at only 24-30 fps...

> > > Another thing, what if the codec does this:
> > > 1) takes frames 0-24 and generates a G bitstream
> > > 2) takes frames 20-44 and generates another G bitstream
> > > and the decoder is supposed to blend frames 20,21,22,23,24 together
> > > to avoid "block artifacts" in the time domain?
> > > In other words, the group of frames are partially overlapped.
> > > (this already happens in audio coding, right?)
> >
> > This should work fine with the sort of things I described above.
> yes for video, but what about audio, with a delay? should the pts represent 
> the first sample the decoder will output for the given bitstream packet? i 
> guess so, it seems to be the most obvious choice

It depends on whether you want to do ugly hacks to accommodate broken
codec implementations that can't pass pts through, or whether you want
to store the correct information. IMO nut pts should _always_ be
actual pts for the data encoded in the packet, not dts. But audio
decoders are broken and often hide the relationship between input
frame and output... Of course a good container would encourage people
to write codecs that don't suck...

Rich