[MPlayer-dev-eng] Nut proposal [Re: Cleaning your nuts]

D Richard Felker III dalias at aerifal.cx
Thu Apr 22 18:39:29 CEST 2004


On Thu, Apr 22, 2004 at 05:57:09PM +0200, Michael Niedermayer wrote:
> > OK, time for a proposal based on these points. Here's my proposed nut
> > frame structure:
> >
> > header_field    v
> > size_msb        v
> > (stream_id)     v
> > (pts)           v
> > data
> >
> > (stream_id and pts are only optionally coded as vlc depending on the
> > contents of header_field)
> one issue with this is CBR audio, where the packet size is always equal to 
> either of the last 2 sizes

See my other post. IMO CBR is irrelevant since CBR is inherently
wasteful of space. If the luser encoding the file can afford to waste
30-50% size on CBR, they can afford to waste 1% more on container
overhead.

> > bit 7: keyframe flag
> s/bit 7/bit 6/

Yes, thanks.

> > First some points about stream code. In a simple 2-stream file, you'll
> > never need to vlc code stream_id. Coding as -1/same/+1 always works.
> > Figuring out that this should work well in the case with more than 2
> > streams is left for the reader. :)
> like with the frame_code system, except that this here seems more difficukt to 
> get optimal with a large number of streams IMHO

IMO it's ok in most cases. Suppose stream 0 is video and streams 1-3
are audio. You'll typically be alternating between audio and video
packets, like:

V AAA V AAA V AAA V AAAAAA V AAA V AAA V AAA V AAAAAA

As long as you code the audio packets in order by streamid, you can
use the +1 predictors very easily:

V AAA V AAA V AAA V AAAAAA V AAA V AAA V AAA V AAAAAA
0 +++ + +++ + +++ + +++1++ + +++ + +++ + +++ + +++1++

This pattern works as long as the audio codec has fixed frame size (in
samples). I'm not sure how badly it will break for vorbis.

> IMHO instead of this +1/-1/0 mess a simple n bits for stream id and 5-n for 
> size_lsb with n stored in the main header seems to be a better choice

IMO this is bad. Consider a file with just 1 audio and video stream,
but 7 subtitle streams. (This is an ideal file! :)

You have 9 streams, so 4 bits are needed for streamid. Thus you can
only code audio packets up to 255 bytes without wasting an extra
header byte. But subtitle packets are VERY INFREQUENT and thus you
don't need efficient coding for their headers. It would make a lot
more sense to only use 1 bit for streamid (0=vlc-coded, 1=audio) and
have 4 bits left for size!

> > Now about pts coding...relative or absolute? Rather than wasting space
> > in the header_field to store this, we store the flag in the pts vlc
> > code. The pts vlc actually stores (pts<<1)|relative_pts_flag.
> >
> > Optional changes: Forget about vlc for the header_field and use 3 bits
> > for size_lsb. This way we can encode sizes up to 1024 with just a
> > 2byte header.
> IMHO, very good idea, we can always increase the file version if we want a 
> different header structure

Yes. It makes avoiding startcode conflict more difficult but IMO it's
worth it for the extra bits.

> > Optional changes, part 2: Make header_field 16-bit and allow more
> > values for stream_code and several more bits for size_lsb. This hurts
> > our best-case overhead quite a bit, but might improve worst-case
> > enough to be worth it...?
> >
> > TODO: Avoid conflicting with start codes. If we keep vlc for the
> > header_field, one way to do this is to change startcodes to have bit 8
> > of their first byte set, and then just make a rule that any extended
> > flags we add in the future won't conflict with the startcode.
> IMHO just ensure that a single combination of flags is invalid and xor the 
> header so this is changed to 'N'

I guess that works too. :)

> > Comments? Michael? Ivan?
> i think we could reduce the number of bits needed for the pts part of the 
> header, as the first after a type 2 frame must be full-pts, and afterwards, 
> if only +1 timestamps occured which for video is likely, there is just +1 vs 
> full pts

The problem is audio. PTS can increment by different values depending
on the size (in samples) of the frame. Think vorbis. That's why we
need 3 predictors. But it matters for video too. With mixed 24/30 fps
[inverse telecined] video, you'll have +4 and +5 timestamps (in
120000/1001 timebase).

Rich




More information about the MPlayer-dev-eng mailing list