
hi all, lu wants me to help on drafting and formatting the rfc, and i think this is a good thing. so i started looking at it and have some ideas for changes to make, but i don't want to step on any feet.. abstract: this should mention the codec-agnosticism and ability to perform container-level operations without any awareness of the data format, since these aspects are probably the most novel from a multimedia standards perspective. i want to add the text "without regard for the specific formats of the streams, and" before "with minimal computational cost" and perhaps work more on improving the content of the abstract. definitions: the definition of pts is somewhat confusing in relation to b frames and out of order decoding. can we do something about "completed by decoding the coded frame" to make it more clear? the definition of dts is rather vague... frame: i had a good formal definition the other day but can't remember at the moment... keyframe: rewording for clarity and formal english: A keyframe is a frame in a stream at which decoding of that stream can successfully begin independent of prior frames. Keyframe status of frames within one stream is independent of any other streams. Assign to each frame in a stream an integer position n. The nth frame of a stream is a keyframe if and only if the mth frame, for each m≥n+k, can be decoded successfully without reference to data contained in the lth frame for any l<n, where k is the smallest nonnegative integer such that keyframes other than the initial frame can exist in the stream's coding scheme. This definition coincides with the ordinary notion of a keyframe for common video coding schemes where k=0, and also allows for overlapped-window audio coding schemes where k≥1 accounts for the missing overlap from the previous frame. Readers should be aware that this definition of keyframe is dependent on the ordering of frames according to pts and dts as mandated by this standard. data types: strings: the requirement that strings not contain U+0000 is mysteriously missing. also, i would like to change the terminology from "string" to "text" to align mostly with the posix sense of text (i.e. bytes must represent characters and must not contain 0). string is somewhat less precise imo because to some people binary data (or binary data without embedded 0's) is a "string" but not "text". also then we could move the requirements of text to a "definition of text" under the definitions section, which i think would be nice organizationally speaking. finally, lu thought this was okay but i'd like to run it by others: can i commit changes like these directly to the rfc xml file? the intent would not be to impose my wishes on others, but rather to accelerate completion of the document. i would certainly welcome others to revert or heavily alter anything objectionable and start discussions on the list as appropriate, but i feel now that we're to the stage of documenting the format we already designed rather than designing it, the types of potential objections are very different and much less severe and we always have the authoritative nut.txt to go back to. if anyone else has differing or better ideas on how to work on the rfc i'd be happy to hear them too. rich

Rich Felker wrote:
finally, lu thought this was okay but i'd like to run it by others: can i commit changes like these directly to the rfc xml file? the intent would not be to impose my wishes on others, but rather to accelerate completion of the document. i would certainly welcome others to revert or heavily alter anything objectionable and start discussions on the list as appropriate, but i feel now that we're to the stage of documenting the format we already designed rather than designing it, the types of potential objections are very different and much less severe and we always have the authoritative nut.txt to go back to. if anyone else has differing or better ideas on how to work on the rfc i'd be happy to hear them too.
to summarize even more: at least for me it's fine if everybody with something to say just commit the change to the svn and then everybody comments on it. this draft should be as volatile as possible right now, pending some style rules I'll enforce (line size, 2 space indentation, tag indentation basically). Once the document is complete I'll push it to ietf and then will start the revision iteration. lu -- Luca Barbato Gentoo/linux Gentoo/PPC http://dev.gentoo.org/~lu_zero

On Fri, Nov 16, 2007 at 01:13:34AM -0500, Rich Felker wrote:
hi all,
lu wants me to help on drafting and formatting the rfc, and i think this is a good thing. so i started looking at it and have some ideas for changes to make, but i don't want to step on any feet..
abstract: this should mention the codec-agnosticism and ability to perform container-level operations without any awareness of the data format, since these aspects are probably the most novel from a multimedia standards perspective. i want to add the text "without regard for the specific formats of the streams, and" before "with minimal computational cost" and perhaps work more on improving the content of the abstract.
very good, i fully agree
definitions:
the definition of pts is somewhat confusing in relation to b frames and out of order decoding. can we do something about "completed by decoding the coded frame" to make it more clear?
the definition of dts is rather vague...
frame: i had a good formal definition the other day but can't remember at the moment...
keyframe: rewording for clarity and formal english:
A keyframe is a frame in a stream at which decoding of that stream can successfully begin independent of prior frames. Keyframe status of frames within one stream is independent of any other streams.
ok
Assign to each frame in a stream an integer position n. The nth frame of a stream is a keyframe if and only if the mth frame, for each m≥n+k, can be decoded successfully without reference to data contained in the lth frame for any l<n, where k is the smallest nonnegative integer such that keyframes other than the initial frame can exist in the stream's coding scheme. This definition coincides with the ordinary notion of a keyframe for common video coding schemes where k=0, and also allows for overlapped-window audio coding schemes where k≥1 accounts for the missing overlap from the previous frame.
what about normal b frames? IPBBIBBBBBPPP here if we start decoding from the 2nd I frame none of the B frames can be decoded as they need the pevious frames but they are after the I frame position wise
Readers should be aware that this definition of keyframe is dependent on the ordering of frames according to pts and dts as mandated by this standard.
data types: strings: the requirement that strings not contain U+0000 is mysteriously missing. also, i would like to change the terminology from "string" to "text" to align mostly with the posix sense of text (i.e. bytes must represent characters and must not contain 0). string is somewhat less precise imo because to some people binary data (or binary data without embedded 0's) is a "string" but not "text". also then we could move the requirements of text to a "definition of text" under the definitions section, which i think would be nice organizationally speaking.
ok
finally, lu thought this was okay but i'd like to run it by others: can i commit changes like these directly to the rfc xml file? the intent would not be to impose my wishes on others, but rather to accelerate completion of the document. i would certainly welcome others to revert or heavily alter anything objectionable and start discussions on the list as appropriate, but i feel now that we're to the stage of documenting the format we already designed rather than designing it, the types of potential objections are very different and much less severe and we always have the authoritative nut.txt to go back to. if anyone else has differing or better ideas on how to work on the rfc i'd be happy to hear them too.
ok, commit as you see fit, ill flame you as i see fit :) [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB In a rich man's house there is no place to spit but his face. -- Diogenes of Sinope

Hi On Fri, Nov 16, 2007 at 04:53:49PM +0100, Michael Niedermayer wrote: [...]
keyframe: rewording for clarity and formal english:
A keyframe is a frame in a stream at which decoding of that stream can successfully begin independent of prior frames. Keyframe status of frames within one stream is independent of any other streams.
ok
Assign to each frame in a stream an integer position n. The nth frame of a stream is a keyframe if and only if the mth frame, for each m≥n+k, can be decoded successfully without reference to data contained in the lth frame for any l<n, where k is the smallest nonnegative integer such that keyframes other than the initial frame can exist in the stream's coding scheme. This definition coincides with the ordinary notion of a keyframe for common video coding schemes where k=0, and also allows for overlapped-window audio coding schemes where k≥1 accounts for the missing overlap from the previous frame.
what about normal b frames? IPBBIBBBBBPPP
here if we start decoding from the 2nd I frame none of the B frames can be decoded as they need the pevious frames but they are after the I frame position wise
heres a different definition for keyframes, note this is not identical to what is in nut.txt A frame in a stream is a keyframe if and only if all of the following are true * Decoding can successfully begin using any standard compliant decoder without requireing access to prior frames. * Begining decoding instead at a subsequent frame would cause fewer frames to be decoded successfully. successfull decoding here means that the specific frame is virtually identical to what one would get if decoding would have begun from the very first frame Note, "virtually identical" here is used instead of "identical" to allow codecs which converge toward the same output when started from different points but dont neccessarily ever reach exactly identical output. (copied from nut.txt Every frame which is marked as a keyframe MUST be a keyframe according to the definition above, a muxer MUST mark every frame it knows is a keyframe as such, a muxer SHOULD NOT analyze future frames to determine the keyframe status of the current frame but instead just set the frame as non-keyframe. ) [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB it is not once nor twice but times without number that the same ideas make their appearance in the world. -- Aristotle

On Sat, Nov 17, 2007 at 04:55:00PM +0100, Michael Niedermayer wrote:
what about normal b frames? IPBBIBBBBBPPP
here if we start decoding from the 2nd I frame none of the B frames can be decoded as they need the pevious frames but they are after the I frame position wise
Yes, sorry, I left out the PTS part... :( I can fix it.
heres a different definition for keyframes, note this is not identical to what is in nut.txt
A frame in a stream is a keyframe if and only if all of the following are true * Decoding can successfully begin using any standard compliant decoder without requireing access to prior frames. * Begining decoding instead at a subsequent frame would cause fewer frames to be decoded successfully.
I'm confused what this second condition is supposed to mean. Of course if you start at a subsequent frame (e.g. the n+1 frame instead of the nth frame) you'll decode fewer frames, provided that you were able to successfully decode the nth frame to begin with. But maybe this characterization is what you're trying to get at: "A frame N is a keyframe if the set of frames which can be successfully decoded by starting with frame N is a strict superset of the set of frames which can be successfully decoded by starting with frame N+1." with "successful" having the meaning you described (perhaps adjusted to be more formal). I have replaced your idea of "more/fewer" with "strict superset/subset" to conform to the fact that the number of frames in stream need not be practically finite. Note however that your definition here (and my version of it) have the (perhaps unfortunate) effect that all frames would be keyframes in a hypothetical codec where the (N%total_blocks)th block is intra-coded in the Nth frame and all other blocks in the Nth frame are residue-coded. This may be better than having no keyframes at all, but it seems sort of perverse nonetheless. Rich

Hi On Sun, Nov 18, 2007 at 01:32:18AM -0500, Rich Felker wrote:
On Sat, Nov 17, 2007 at 04:55:00PM +0100, Michael Niedermayer wrote:
what about normal b frames? IPBBIBBBBBPPP
here if we start decoding from the 2nd I frame none of the B frames can be decoded as they need the pevious frames but they are after the I frame position wise
Yes, sorry, I left out the PTS part... :( I can fix it.
heres a different definition for keyframes, note this is not identical to what is in nut.txt
A frame in a stream is a keyframe if and only if all of the following are true * Decoding can successfully begin using any standard compliant decoder without requireing access to prior frames. * Begining decoding instead at a subsequent frame would cause fewer frames to be decoded successfully.
I'm confused what this second condition is supposed to mean. Of course if you start at a subsequent frame (e.g. the n+1 frame instead of the nth frame) you'll decode fewer frames, provided that you were able to successfully decode the nth frame to begin with. But maybe this characterization is what you're trying to get at:
"A frame N is a keyframe if the set of frames which can be successfully decoded by starting with frame N is a strict superset of the set of frames which can be successfully decoded by starting with frame N+1."
with "successful" having the meaning you described (perhaps adjusted to be more formal). I have replaced your idea of "more/fewer" with "strict superset/subset" to conform to the fact that the number of frames in stream need not be practically finite.
Note however that your definition here (and my version of it) have the (perhaps unfortunate) effect that all frames would be keyframes in a hypothetical codec where the (N%total_blocks)th block is intra-coded in the Nth frame and all other blocks in the Nth frame are residue-coded. This may be better than having no keyframes at all, but it seems sort of perverse nonetheless.
that was the reason why i wrote it like that :) such streams exist and it even makes sense for them to exist ... just think of realtime communication over a low bandwidth channel (portable phones or something like that would be an example) i frames are many times bigger than p frames at the same quality so having p frames with lets say 10% intra coded blocks leads to lower delay than i frame would. and i frames at lower quality would lead to unacceptable flickering every time a low quality blocky I frame would be received so yes, i think all frames should be key frames in that case ... also the encoder can optimize the decission of which blocks to code as intra ... [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Concerning the gods, I have no means of knowing whether they exist or not or of what sort they may be, because of the obscurity of the subject, and the brevity of human life -- Protagoras

On Sun, Nov 18, 2007 at 02:48:51PM +0100, Michael Niedermayer wrote:
that was the reason why i wrote it like that :) such streams exist and it even makes sense for them to exist ... just think of realtime communication over a low bandwidth channel (portable phones or something like that would be an example) i frames are many times bigger than p frames at the same quality so having p frames with lets say 10% intra coded blocks leads to lower delay than i frame would. and i frames at lower quality would lead to unacceptable flickering every time a low quality blocky I frame would be received so yes, i think all frames should be key frames in that case ... also the encoder can optimize the decission of which blocks to code as intra ...
I agree such codecs should exist, I just wonder whether considering all frames to be keyframes is an appropriate solution. After all, imagine a program using a file with such a codec as a source for editing. "This frame is a keyframe" does not mean much if it doesn't guarantee you can decode all frames with pts >= this_frame.pts without significant error. You might as well just ignore the keyframe flag for seeking then.. Rich

On Sun, Nov 18, 2007 at 09:42:14PM -0500, Rich Felker wrote:
On Sun, Nov 18, 2007 at 02:48:51PM +0100, Michael Niedermayer wrote:
that was the reason why i wrote it like that :) such streams exist and it even makes sense for them to exist ... just think of realtime communication over a low bandwidth channel (portable phones or something like that would be an example) i frames are many times bigger than p frames at the same quality so having p frames with lets say 10% intra coded blocks leads to lower delay than i frame would. and i frames at lower quality would lead to unacceptable flickering every time a low quality blocky I frame would be received so yes, i think all frames should be key frames in that case ... also the encoder can optimize the decission of which blocks to code as intra ...
I agree such codecs should exist, I just wonder whether considering all frames to be keyframes is an appropriate solution. After all, imagine a program using a file with such a codec as a source for editing. "This frame is a keyframe" does not mean much if it doesn't guarantee you can decode all frames with pts >= this_frame.pts without significant error. You might as well just ignore the keyframe flag for seeking then..
it shouldnt be hard to consider the frames with significant error in the back ptrs and index (its pretty much just replacing pts[x] by pts[x]+delay_until_no_error[x]) i of course havnt considered what effects that would have on "optimal seeking" but as long as delay_until_no_error[x]=0 it shouldnt be affected and if you dont set them as keyframes you cant seek or how should the demuxer know it should ignore keyframe flags for a specific file? [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB The worst form of inequality is to try to make unequal things equal. -- Aristotle

Michael Niedermayer wrote:
... and if you dont set them as keyframes you cant seek or how should the demuxer know it should ignore keyframe flags for a specific file?
It appears the exact definition of "keyframe" or "successful decoding" is codec- and application-specific. Since the only thing the nut format does with keyframes is to optimize seeking to them, and since it does _not_ specify how codecs or applications should behave when decoding, I think it would be good idea if the definition doesn't rely on the meaning of "successful decoding". In other words, a muxer uses a keyframe when it wants a demuxer to be able to seek to it, but the determination of whether a frame can be a keyframe is left up to the encoder/muxer. Regards, Clemens

On Mon, Nov 19, 2007 at 04:27:43PM +0100, Clemens Ladisch wrote:
Michael Niedermayer wrote:
... and if you dont set them as keyframes you cant seek or how should the demuxer know it should ignore keyframe flags for a specific file?
It appears the exact definition of "keyframe" or "successful decoding" is codec- and application-specific.
Since the only thing the nut format does with keyframes is to optimize seeking to them, and since it does _not_ specify how codecs or applications should behave when decoding, I think it would be good idea if the definition doesn't rely on the meaning of "successful decoding".
In other words, a muxer uses a keyframe when it wants a demuxer to be able to seek to it, but the determination of whether a frame can be a keyframe is left up to the encoder/muxer.
that works as long as the muxer only marks frames as keyframes which can be successfully decoded if it does _anything_ else it will generate files which are broken as the demuxer/decoder can no longer decoder the keyframe or to say it differently it can no longer rely on being able to decode the keyframe that would break litteraly everything and make nut as unseekable as mpeg-ps/ts [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB I am the wisest man alive, for I know one thing, and that is that I know nothing. -- Socrates

Michael Niedermayer wrote:
On Mon, Nov 19, 2007 at 04:27:43PM +0100, Clemens Ladisch wrote:
Michael Niedermayer wrote:
... and if you dont set them as keyframes you cant seek or how should the demuxer know it should ignore keyframe flags for a specific file?
It appears the exact definition of "keyframe" or "successful decoding" is codec- and application-specific.
Since the only thing the nut format does with keyframes is to optimize seeking to them, and since it does _not_ specify how codecs or applications should behave when decoding, I think it would be good idea if the definition doesn't rely on the meaning of "successful decoding".
In other words, a muxer uses a keyframe when it wants a demuxer to be able to seek to it, but the determination of whether a frame can be a keyframe is left up to the encoder/muxer.
that works as long as the muxer only marks frames as keyframes which can be successfully decoded if it does _anything_ else it will generate files which are broken
Yes. My point is that there can be applications with different requirements as to what constitutes successful decoding, like in the 10% intra blocks example that I snipped, where a real-time player should be able to (re-)start playing everywhere but an editor would not want to. A muxer might want to use differents algorithms for the same data, depending on how many frames it wants to appear as keyframes in the index and in the syncpoint structure, because those are that actual effects that setting a frame's key flag has on a nut file. Regards, Clemens

On Tue, Nov 20, 2007 at 09:05:01AM +0100, Clemens Ladisch wrote:
Michael Niedermayer wrote:
On Mon, Nov 19, 2007 at 04:27:43PM +0100, Clemens Ladisch wrote:
Michael Niedermayer wrote:
... and if you dont set them as keyframes you cant seek or how should the demuxer know it should ignore keyframe flags for a specific file?
It appears the exact definition of "keyframe" or "successful decoding" is codec- and application-specific.
Since the only thing the nut format does with keyframes is to optimize seeking to them, and since it does _not_ specify how codecs or applications should behave when decoding, I think it would be good idea if the definition doesn't rely on the meaning of "successful decoding".
In other words, a muxer uses a keyframe when it wants a demuxer to be able to seek to it, but the determination of whether a frame can be a keyframe is left up to the encoder/muxer.
that works as long as the muxer only marks frames as keyframes which can be successfully decoded if it does _anything_ else it will generate files which are broken
Yes. My point is that there can be applications with different requirements as to what constitutes successful decoding, like in the
and my point is that this is not true
10% intra blocks example that I snipped, where a real-time player should be able to (re-)start playing everywhere but an editor would not want to.
you dont want your editor to be able to edit the file? because that is what would happen if it cannot start anywhere (= no classic keyframes) maybe your confussion comes from the assumtation that a decoder would output incorrectly decoded frames after seeking at least for h.264 this is not the case, there are packets describing when the output is correct we could surely duplicate this information in the frames in nut
A muxer might want to use differents algorithms for the same data, depending on how many frames it wants to appear as keyframes in the index and in the syncpoint structure, because those are that actual effects that setting a frame's key flag has on a nut file.
nut.txt says "a muxer MUST mark every frame it knows is a keyframe as such" [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Good people do not need laws to tell them to act responsibly, while bad people will find a way around the laws. -- Plato

On Tue, Nov 20, 2007 at 09:05:01AM +0100, Clemens Ladisch wrote:
A muxer might want[...]
This is contrary to the principles of NUT. The muxer does not create a file based on its own opinion about how the user of the file will want to use it. There's one correct way to mux (aside from decisions that just affect efficiency of the storage) which guarantees that a demuxer can obtain the same behavior on any valid file without having to guess what the muxer intended. Moreover the whole idea you've proposed is irrelevant to our hypothetical codec. Think about it for a second and you'll realize that selecting the points to which seeking is allowed does nothing to help the issue, unless you only mark "true" keyframes as keyframes, i.e. only the first frame. One possible thing that would help would be an optional "convergence time bound" header field. Rich
participants (4)
-
Clemens Ladisch
-
Luca Barbato
-
Michael Niedermayer
-
Rich Felker