r21189 - trunk/DOCS/tech/nut.txt

Author: michael Date: Fri Nov 24 13:51:33 2006 New Revision: 21189 Modified: trunk/DOCS/tech/nut.txt Log: codec_specific_data clarification if anyone disagrees or has suggestions to improve it then shout Modified: trunk/DOCS/tech/nut.txt ============================================================================== --- trunk/DOCS/tech/nut.txt (original) +++ trunk/DOCS/tech/nut.txt Fri Nov 24 13:51:33 2006 @@ -568,6 +568,13 @@ the exact format is specified in the codec spec for H.264 the NAL units MUST be formatted as in a bytestream (with 00 00 01 prefixes) + codec_specific_data SHOULD contain exactly the essential global packets + needed to decode a stream, more specifically it SHOULD NOT contain packets + which contain only non essential metadata like author, title, ... + it also MUST NOT contain normal packets which cause the reference decoder + to generate any specific decoded samples + the encoder name and version, shall be considered essential as it is very + usefull to workaround possible encoder bugs frame_code frame_code is an 8-bit field which exists before every frame, it can

On Fri, Nov 24, 2006 at 01:51:34PM +0100, michael wrote:
if anyone disagrees or has suggestions to improve it then shout
+ codec_specific_data SHOULD contain exactly the essential global packets + needed to decode a stream, more specifically it SHOULD NOT contain packets + which contain only non essential metadata like author, title, ...
For codecs with required stream-embedded metadata like ours, I think this is just making work for the muxer. I'd allow such packets, and instead say that implementations SHOULD maintain and prefer container-level metadata with NUT. The packet should be there, even if it's minimal. -r

On Fri, Nov 24, 2006 at 09:05:31AM -0800, Ralph Giles wrote:
On Fri, Nov 24, 2006 at 01:51:34PM +0100, michael wrote:
if anyone disagrees or has suggestions to improve it then shout
+ codec_specific_data SHOULD contain exactly the essential global packets + needed to decode a stream, more specifically it SHOULD NOT contain packets + which contain only non essential metadata like author, title, ...
For codecs with required stream-embedded metadata like ours, I think this is just making work for the muxer. I'd allow such packets, and instead say that implementations SHOULD maintain and prefer container-level metadata with NUT. The packet should be there, even if it's minimal.
The packet can be there if it's required by the spec, but the metadata fields should all be blank, and should be completely ignored by any player. I would be in favor of a requirement that a compliant player MUST NOT present user-oriented metadata from codec bitstream (just like how using timestamps from the codec bitstream is already forbidden). Rich

Hi On Fri, Nov 24, 2006 at 12:37:31PM -0500, Rich Felker wrote:
On Fri, Nov 24, 2006 at 09:05:31AM -0800, Ralph Giles wrote:
On Fri, Nov 24, 2006 at 01:51:34PM +0100, michael wrote:
if anyone disagrees or has suggestions to improve it then shout
+ codec_specific_data SHOULD contain exactly the essential global packets + needed to decode a stream, more specifically it SHOULD NOT contain packets + which contain only non essential metadata like author, title, ...
For codecs with required stream-embedded metadata like ours, I think this is just making work for the muxer. I'd allow such packets, and instead say that implementations SHOULD maintain and prefer container-level metadata with NUT. The packet should be there, even if it's minimal.
The packet can be there if it's required by the spec, but the metadata fields should all be blank, and should be completely ignored by any player. I would be in favor of a requirement that a compliant player MUST NOT present user-oriented metadata from codec bitstream
hmm, i see 3 possibilities for xiph codecs 1. store the metadata packet as is 2. dont store the metadata packet 3. store a dummy (empty) metadata packet the metadata has to be parsed and put in a common structure at some point in all cases be it the ogg demuxer, vorbis parser, nut muxer or whatever otherwise the metadata would be pretty much unavailable to a nut player (we cannot require every nut player to be able to parse codec specific metadata) 1. would cause the data to be duplicated, if one gets edited by the user we have contradicting data, thats very bad, also the stream headers would be larger then needed, and the stream headers get repeated ... 2. would need some dummy or correct metadata packet to be generated somewhere at the demuxer side together with the split global to xiph packets if needed (muxing in ogg or vorbis, ... decoder needing it) 3. would need some dummy metadata packet to be generated somewhere at the muxer side, and then be replaced if needed by the correct metadata at the demuxer side, spliting still is needed for muxing in ogg or vorbis, ... decoders which need it i dont like 1. at all, 2. and 3. are pretty much the same if there is an established standard which requires all 3 packets to be stored then no doubt we should follow that, thats also said in nut.txt "the exact format is specified in the codec spec" but there is no such thing, RTP says dont store metadata, other containers and APIs combine the 3 packets in various ways ... so if xiph "officially" says store all 3 with the following format ... in a single packet then we will certainly do so, if not then i dont know what would be best also in a generic framework its likely that the metadata packet will be parsed and the metadata be put into a common structure in the ogg demxuer or some parser not the nut muxer, as other muxers also need the metadata ... similarely the reverse of this would be done in the ogg muxer or some bitstream filter not the nut demuxer as other demuxers set metadata and that should end up in the final ogg stream [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB In the past you could go to a library and read, borrow or copy any book Today you'd get arrested for mere telling someone where the library is

On Fri, Nov 24, 2006 at 11:35:24PM +0100, Michael Niedermayer wrote:
hmm, i see 3 possibilities for xiph codecs 1. store the metadata packet as is 2. dont store the metadata packet 3. store a dummy (empty) metadata packet
I would vote for 1 or 3. I disagreed with doing 2 in the RTP draft, but didn't insist because the payload implementation already has to be codec-specific. For general container use like this, I think it's important to require the metadata packet where the codec requires it. This is less surprising for implementors. You do still have to know how the vorbis headers are packed, but for playback there is still a difference between these two procedures: # option 1 or 3 for header in global_headers: submit_packet_to_codec(header); # option 2 header_number = 0; for header in global_headers: submit_packet_to_codec(header) header_number += 1 if (header_number == 1): header = construct_dummy_metadata_packet() submit_packet_to_codec(header) I'd rather have to write the first one. This imposes an overhead of (8 bytes + one packet overhead) on any header blob, assuming a minimal vorbis-style comment packet. I would leave the encoder id string in the packet, so Therefore I'd propose, "The global headers MUST consist of the normal sequence of header packets required for codec initialization, in the order expected by the codec. An implementation MAY strip metadata and other redundant information not necessary for correct playback from the global headers to save space in the bitstream." -r

On Fri, Nov 24, 2006 at 03:49:25PM -0800, Ralph Giles wrote:
This imposes an overhead of (8 bytes + one packet overhead) on any header blob, assuming a minimal vorbis-style comment packet. I would leave the encoder id string in the packet, so
Oops. I forgot the magic, so that should be 7+8 bytes for an empty packet. I would leave the encoder id string in the packet, which would be and extra ~48 bytes in the global header blob. Speaking of, can NUT info packets be compressed? -r

Hi On Fri, Nov 24, 2006 at 04:00:37PM -0800, Ralph Giles wrote: [...]
Speaking of, can NUT info packets be compressed?
no, there where some disscussions about compressing them by using a table of "common" types/values in a header and then using just numbers instead of strings but we didnt find a solution which everyone agreed to if it just me id add such a table to the first info packet after the stream headers ... [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB In the past you could go to a library and read, borrow or copy any book Today you'd get arrested for mere telling someone where the library is

On Sat, Nov 25, 2006 at 01:17:18AM +0100, Michael Niedermayer wrote:
no, there where some disscussions about compressing them by using a table of "common" types/values in a header and then using just numbers instead of strings but we didnt find a solution which everyone agreed to
What about running all the fields together and compressing the blob with deflate, like the PNG zTXT chunk? -r

Hi On Fri, Nov 24, 2006 at 04:30:02PM -0800, Ralph Giles wrote:
On Sat, Nov 25, 2006 at 01:17:18AM +0100, Michael Niedermayer wrote:
no, there where some disscussions about compressing them by using a table of "common" types/values in a header and then using just numbers instead of strings but we didnt find a solution which everyone agreed to
What about running all the fields together and compressing the blob with deflate, like the PNG zTXT chunk?
dependancy on zlib, and zlib is likely bigger then the demuxer furthermore if we compress each info packet seperately then deflate will likely not be very efficient, if we compress all info packets together then one damaged info packet would cause all info packets afterwards to be lost, and this also wont work with "midstream" info packets [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB In the past you could go to a library and read, borrow or copy any book Today you'd get arrested for mere telling someone where the library is

On Sat, Nov 25, 2006 at 02:38:18AM +0100, Michael Niedermayer wrote:
Hi
On Fri, Nov 24, 2006 at 04:30:02PM -0800, Ralph Giles wrote:
On Sat, Nov 25, 2006 at 01:17:18AM +0100, Michael Niedermayer wrote:
no, there where some disscussions about compressing them by using a table of "common" types/values in a header and then using just numbers instead of strings but we didnt find a solution which everyone agreed to
What about running all the fields together and compressing the blob with deflate, like the PNG zTXT chunk?
dependancy on zlib, and zlib is likely bigger then the demuxer furthermore if we compress each info packet seperately then deflate will likely not be very efficient, if we compress all info packets together then one damaged info packet would cause all info packets afterwards to be lost, and this also wont work with "midstream" info packets
exactly. mandating zlib support is absolutely out of the question. the gain in size is _tiny_ and the penalty in implementation complexity is huge. rich

Hi On Fri, Nov 24, 2006 at 03:49:25PM -0800, Ralph Giles wrote:
On Fri, Nov 24, 2006 at 11:35:24PM +0100, Michael Niedermayer wrote:
hmm, i see 3 possibilities for xiph codecs 1. store the metadata packet as is 2. dont store the metadata packet 3. store a dummy (empty) metadata packet
I would vote for 1 or 3.
I disagreed with doing 2 in the RTP draft, but didn't insist because the payload implementation already has to be codec-specific. For general container use like this, I think it's important to require the metadata packet where the codec requires it. This is less surprising for implementors.
You do still have to know how the vorbis headers are packed, but for playback there is still a difference between these two procedures:
# option 1 or 3 for header in global_headers: submit_packet_to_codec(header);
# option 2 header_number = 0; for header in global_headers: submit_packet_to_codec(header) header_number += 1 if (header_number == 1): header = construct_dummy_metadata_packet() submit_packet_to_codec(header)
submit_packet_to_codec(header) submit_packet_to_codec(dummy_defined_in_a_static_char_array) submit_packet_to_codec(header) but in reality the demuxer will likely pass the single global header unchanged around, and the vorbis decoder (or wraper) would then option 1 or 3 parse_header1(); skip_header2(); parse_header3(); vs. option 2 parse_header1(); parse_header3(); your method might depening on API end up looking like demuxer_open(){ blah blah for all streams if stream is one of vorbis, theora, ... stream->xiph_glob= alloc(3) stream->xiph_glob_len= alloc(3) stream->xiph_glob_index=0; for i in 3 find len of packet stream->xiph_glob_len[i]= len stream->xiph_glob[i]= alloc(len) read packet into stream->xiph_glob[i] } demuxer_get_packet() if(stream->xiph_glob_index<3 && stream->xiph_glob[stream->xiph_glob_index]){ packet.dts= dunno has none packet.pts= dunno has none packet.duration= 0 packet.flags= ? packet.len= stream->xiph_glob_len[stream->xiph_glob_index] memcpy(packet.data, stream->xiph_glob[stream->xiph_glob_index], packet.len); stream->xiph_glob_index++ stream->xiph_glob_len[stream->xiph_glob_index]=0 free(stream->xiph_glob[stream->xiph_glob_index]) return } considering that this would be needed for every non ogg demuxer i wouldnt implement it like that ... [...]
Therefore I'd propose, "The global headers MUST consist of the normal sequence of header packets required for codec initialization, in the order expected by the codec. An implementation MAY strip metadata and other redundant information not necessary for correct playback from the global headers to save space in the bitstream."
i would s/expected by the codec/defined in the codec spec/ otherwise its highly unclear, think about mpeg and its headers, the spec is clear what is allowed and what not, specific implementations might be more forgiving but still its the mpegvideo spec which we should follow in that case also i would rather say An implementation MAY/SHOULD/MUST strip metadata and other redundant information not necessary for correct playback from the global headers as long as no incorrect values are stored and as long as the striped result is not less valid per codec spec as before striping (probably that could be improved too ...) anyway after thinking about that a little option 3 with the encoder name and version in the metadata packet seems like the best solution ... [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB In the past you could go to a library and read, borrow or copy any book Today you'd get arrested for mere telling someone where the library is

On Sat, Nov 25, 2006 at 01:53:18AM +0100, Michael Niedermayer wrote:
Therefore I'd propose, "The global headers MUST consist of the normal sequence of header packets required for codec initialization, in the order expected by the codec. An implementation MAY strip metadata and other redundant information not necessary for correct playback from the global headers to save space in the bitstream."
i would s/expected by the codec/defined in the codec spec/
Yes, that's better.
An implementation MAY/SHOULD/MUST strip metadata and other redundant information not necessary for correct playback from the global headers as long as no incorrect values are stored and as long as the striped result is not less valid per codec spec as before striping
It shouldn't be MUST because that would be a silly reason to stop parsing a stream. I've already voted for MAY.
anyway after thinking about that a little option 3 with the encoder name and version in the metadata packet seems like the best solution ...
That would be fine with me. -r

On Fri, Nov 24, 2006 at 11:35:24PM +0100, Michael Niedermayer wrote:
Hi
On Fri, Nov 24, 2006 at 12:37:31PM -0500, Rich Felker wrote:
On Fri, Nov 24, 2006 at 09:05:31AM -0800, Ralph Giles wrote:
On Fri, Nov 24, 2006 at 01:51:34PM +0100, michael wrote:
if anyone disagrees or has suggestions to improve it then shout
+ codec_specific_data SHOULD contain exactly the essential global packets + needed to decode a stream, more specifically it SHOULD NOT contain packets + which contain only non essential metadata like author, title, ...
For codecs with required stream-embedded metadata like ours, I think this is just making work for the muxer. I'd allow such packets, and instead say that implementations SHOULD maintain and prefer container-level metadata with NUT. The packet should be there, even if it's minimal.
The packet can be there if it's required by the spec, but the metadata fields should all be blank, and should be completely ignored by any player. I would be in favor of a requirement that a compliant player MUST NOT present user-oriented metadata from codec bitstream
hmm, i see 3 possibilities for xiph codecs 1. store the metadata packet as is
What does "as is" mean? This packet should be empty in the case of a new encoding, anyway. The only way it would contain data is when remuxing from ogg..
2. dont store the metadata packet 3. store a dummy (empty) metadata packet
IMO option 3 is the best. It's no horrible problem if option 1 happens sometime in practice, but players should be considered noncompliant if they use the metadata from these headers. "Smart" muxers could be vorbis-aware and strip any crap out of the header before muxing. :) Rich
participants (4)
-
michael
-
Michael Niedermayer
-
Ralph Giles
-
Rich Felker