
Author: michael Date: Tue Feb 12 16:00:09 2008 New Revision: 613 Log: More about interleaving. Modified: docs/nutissues.txt Modified: docs/nutissues.txt ============================================================================== --- docs/nutissues.txt (original) +++ docs/nutissues.txt Tue Feb 12 16:00:09 2008 @@ -162,3 +162,8 @@ How do we identify the interleaving A. fourcc B. extradata C. New field in the stream header +D. Only allow 1 standard interleaving + +What about the interleaving of non raw codecs, do all specify the +interleaving, or does any leave it to the container? If so, our options +would be down to only C.

On Tue, 12 Feb 2008 16:00:10 +0100 (CET) michael <subversion@mplayerhq.hu> wrote:
Modified: docs/nutissues.txt ============================================================================== --- docs/nutissues.txt (original) +++ docs/nutissues.txt Tue Feb 12 16:00:09 2008 @@ -162,3 +162,8 @@ How do we identify the interleaving A. fourcc B. extradata
I would vote for this with a single fourcc for pcm and a single fourcc for raw video. Having infos about the data format packed in the fourcc is ugly and useless. That just lead to inflexible lookup tables and the like. Instead we should just define the format in a way similar to what mp_image provide for video (colorspace, packed or not, shift used for the subsampled planes, etc). That would allow implementations simply supporting all definable format, instead of a selection of what happened to be commonly used formats at the time the implementation was written.
C. New field in the stream header +D. Only allow 1 standard interleaving + +What about the interleaving of non raw codecs, do all specify the +interleaving, or does any leave it to the container? If so, our options +would be down to only C.
On a related subject, it might also be useful to define the channel disposition when there is more than one. Mono and stereo can go by with the classical default, but as soon as there is more channels it is really unclear. And imho such info could still be usefull with 1 or 2 channels. Something like the position of each channel in polar coordinate (2D or 3D?) should be enouth. Albeu

On Tue, Feb 12, 2008 at 05:47:13PM +0100, Alban Bedel wrote:
On Tue, 12 Feb 2008 16:00:10 +0100 (CET) michael <subversion@mplayerhq.hu> wrote:
Modified: docs/nutissues.txt ============================================================================== --- docs/nutissues.txt (original) +++ docs/nutissues.txt Tue Feb 12 16:00:09 2008 @@ -162,3 +162,8 @@ How do we identify the interleaving A. fourcc B. extradata
I would vote for this with a single fourcc for pcm and a single fourcc for raw video. Having infos about the data format packed in the fourcc is ugly and useless. That just lead to inflexible lookup tables and the like.
Instead we should just define the format in a way similar to what mp_image provide for video (colorspace, packed or not, shift used for the subsampled planes, etc). That would allow implementations simply supporting all definable format, instead of a selection of what happened to be commonly used formats at the time the implementation was written.
The key points here are that * colorspace/shift for subsampled planes, etc is not specific to RAW, its more like sample_rate or width/height * non raw codecs have clearly defined global headers (sometimes at least) -> thus we cant really use extradata for it extradata would only be ok for things we definitly dont ever need for non raw
C. New field in the stream header +D. Only allow 1 standard interleaving + +What about the interleaving of non raw codecs, do all specify the +interleaving, or does any leave it to the container? If so, our options +would be down to only C.
On a related subject, it might also be useful to define the channel disposition when there is more than one. Mono and stereo can go by with the classical default, but as soon as there is more channels it is really unclear. And imho such info could still be usefull with 1 or 2 channels. Something like the position of each channel in polar coordinate (2D or 3D?) should be enouth.
I agree What about that LFE channel thing? And where do we put this info, The stream header seems the logic place if you ask me ... [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Democracy is the form of government in which you can choose your dictator

On Tue, 12 Feb 2008 17:57:03 +0100 Michael Niedermayer <michaelni@gmx.at> wrote:
On Tue, Feb 12, 2008 at 05:47:13PM +0100, Alban Bedel wrote:
On Tue, 12 Feb 2008 16:00:10 +0100 (CET) michael <subversion@mplayerhq.hu> wrote:
Modified: docs/nutissues.txt ============================================================================== --- docs/nutissues.txt (original) +++ docs/nutissues.txt Tue Feb 12 16:00:09 2008 @@ -162,3 +162,8 @@ How do we identify the interleaving A. fourcc B. extradata
I would vote for this with a single fourcc for pcm and a single fourcc for raw video. Having infos about the data format packed in the fourcc is ugly and useless. That just lead to inflexible lookup tables and the like.
Instead we should just define the format in a way similar to what mp_image provide for video (colorspace, packed or not, shift used for the subsampled planes, etc). That would allow implementations simply supporting all definable format, instead of a selection of what happened to be commonly used formats at the time the implementation was written.
The key points here are that * colorspace/shift for subsampled planes, etc is not specific to RAW, its more like sample_rate or width/height
Sure, but when a "real" codec is used, it's the decoder business to tell the app what output format it will use. NUT can provide infos about the internal format used by the codec, that would help dealing with decoder including slow colorspace conversions. But that's definitly non-essential information, any player should be able to do without it. However for RAW data the "decoder" need to know the exact format used, just like some other decoder need some huffman tables or whatever. And the logical place for such information is the extradata imho.
* non raw codecs have clearly defined global headers (sometimes at least) -> thus we cant really use extradata for it extradata would only be ok for things we definitly dont ever need for non raw
imho the 2 case are completly different. For raw codecs we are talking about informations essential to the decoder initialization. For non raw codecs we are talking about some extra informations only usefull in some applications. Both need to encode the same type of informations but imho they should be stored in different places.
C. New field in the stream header +D. Only allow 1 standard interleaving + +What about the interleaving of non raw codecs, do all specify the +interleaving, or does any leave it to the container? If so, our options +would be down to only C.
On a related subject, it might also be useful to define the channel disposition when there is more than one. Mono and stereo can go by with the classical default, but as soon as there is more channels it is really unclear. And imho such info could still be usefull with 1 or 2 channels. Something like the position of each channel in polar coordinate (2D or 3D?) should be enouth.
I agree What about that LFE channel thing?
I was thinking about simply setting the distance to 0, however a flag for "non-directional" channels might be better.
And where do we put this info, The stream header seems the logic place if you ask me ...
I agree, this is essential information for proper presentation it definitly belong there. Albeu

On Tue, Feb 12, 2008 at 07:37:53PM +0100, Alban Bedel wrote:
Sure, but when a "real" codec is used, it's the decoder business to tell the app what output format it will use. NUT can provide infos about the internal format used by the codec, that would help dealing with decoder including slow colorspace conversions. But that's definitly non-essential information, any player should be able to do without it.
However for RAW data the "decoder" need to know the exact format used, just like some other decoder need some huffman tables or whatever. And the logical place for such information is the extradata imho.
Agree. Rich

On Tue, Feb 12, 2008 at 07:37:53PM +0100, Alban Bedel wrote:
On Tue, 12 Feb 2008 17:57:03 +0100 Michael Niedermayer <michaelni@gmx.at> wrote:
On Tue, Feb 12, 2008 at 05:47:13PM +0100, Alban Bedel wrote:
On Tue, 12 Feb 2008 16:00:10 +0100 (CET) michael <subversion@mplayerhq.hu> wrote:
Modified: docs/nutissues.txt ============================================================================== --- docs/nutissues.txt (original) +++ docs/nutissues.txt Tue Feb 12 16:00:09 2008 @@ -162,3 +162,8 @@ How do we identify the interleaving A. fourcc B. extradata
I would vote for this with a single fourcc for pcm and a single fourcc for raw video. Having infos about the data format packed in the fourcc is ugly and useless. That just lead to inflexible lookup tables and the like.
Instead we should just define the format in a way similar to what mp_image provide for video (colorspace, packed or not, shift used for the subsampled planes, etc). That would allow implementations simply supporting all definable format, instead of a selection of what happened to be commonly used formats at the time the implementation was written.
The key points here are that * colorspace/shift for subsampled planes, etc is not specific to RAW, its more like sample_rate or width/height
Sure, but when a "real" codec is used, it's the decoder business to tell the app what output format it will use. NUT can provide infos about the internal format used by the codec,
Only very few codecs have headers which store informations about things like shift for subsampled planes. Thus if this information is desired it has to come from the container more often than not. If its not desired then we also dont need it for raw IMHO.
that would help dealing with decoder including slow colorspace conversions.
I have no interrest in supporting or helping this case, and i suspect iam not alone here.
But that's definitly non-essential information, any player should be able to do without it.
However for RAW data the "decoder" need to know the exact format used, just like some other decoder need some huffman tables or whatever. And the logical place for such information is the extradata imho.
see above, also there really are 2 things 1. How things are stored (packed vs. planar, the precisse byte packing, ... 2. What is stored (colorspace details YUV BT123 vs BT567, chroma shift, channel positions 1. defines the format that is packing of raw bytes this is somehow similar to mpeg4 vs h261 thus i think it should be specified by the fourcc 2. is needed for non raw as well which makes fourcc and extradata unuseable [...]
C. New field in the stream header +D. Only allow 1 standard interleaving + +What about the interleaving of non raw codecs, do all specify the +interleaving, or does any leave it to the container? If so, our options +would be down to only C.
On a related subject, it might also be useful to define the channel disposition when there is more than one. Mono and stereo can go by with the classical default, but as soon as there is more channels it is really unclear. And imho such info could still be usefull with 1 or 2 channels. Something like the position of each channel in polar coordinate (2D or 3D?) should be enouth.
I agree What about that LFE channel thing?
I was thinking about simply setting the distance to 0, however a flag for "non-directional" channels might be better.
This is wrong, LFE is not about direction but about the type of speaker. LFE stands for "Low-frequency effects". If id move a other random speaker at disatnce 0 and the LFE one out and switch channels it wont sound correct ...
And where do we put this info, The stream header seems the logic place if you ask me ...
I agree, this is essential information for proper presentation it definitly belong there.
Good, now we just need to agree on some half sane way to store it. for(i=0; i<num_channels; i++){ x_position s y_position s z_position s channel_flags v } CHANNEL_FLAG_LFE 1 seems ok? [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB While the State exists there can be no freedom; when there is freedom there will be no State. -- Vladimir Lenin

Michael Niedermayer <michaelni@gmx.at> writes:
On Tue, Feb 12, 2008 at 07:37:53PM +0100, Alban Bedel wrote:
On Tue, 12 Feb 2008 17:57:03 +0100 Michael Niedermayer <michaelni@gmx.at> wrote:
On Tue, Feb 12, 2008 at 05:47:13PM +0100, Alban Bedel wrote:
On Tue, 12 Feb 2008 16:00:10 +0100 (CET) michael <subversion@mplayerhq.hu> wrote:
Modified: docs/nutissues.txt ============================================================================== --- docs/nutissues.txt (original) +++ docs/nutissues.txt Tue Feb 12 16:00:09 2008 @@ -162,3 +162,8 @@ How do we identify the interleaving A. fourcc B. extradata
I would vote for this with a single fourcc for pcm and a single fourcc for raw video. Having infos about the data format packed in the fourcc is ugly and useless. That just lead to inflexible lookup tables and the like.
Instead we should just define the format in a way similar to what mp_image provide for video (colorspace, packed or not, shift used for the subsampled planes, etc). That would allow implementations simply supporting all definable format, instead of a selection of what happened to be commonly used formats at the time the implementation was written.
The key points here are that * colorspace/shift for subsampled planes, etc is not specific to RAW, its more like sample_rate or width/height
Sure, but when a "real" codec is used, it's the decoder business to tell the app what output format it will use. NUT can provide infos about the internal format used by the codec,
Only very few codecs have headers which store informations about things like shift for subsampled planes. Thus if this information is desired it has to come from the container more often than not. If its not desired then we also dont need it for raw IMHO.
With compressed video, the decoder informs the caller of the pixel format. With raw video, this information must come from the container, one way or other.
that would help dealing with decoder including slow colorspace conversions.
I have no interrest in supporting or helping this case, and i suspect iam not alone here.
But that's definitly non-essential information, any player should be able to do without it.
However for RAW data the "decoder" need to know the exact format used, just like some other decoder need some huffman tables or whatever. And the logical place for such information is the extradata imho.
see above, also there really are 2 things 1. How things are stored (packed vs. planar, the precisse byte packing, ... 2. What is stored (colorspace details YUV BT123 vs BT567, chroma shift, channel positions
1. defines the format that is packing of raw bytes this is somehow similar to mpeg4 vs h261 thus i think it should be specified by the fourcc 2. is needed for non raw as well which makes fourcc and extradata unuseable
The colourspace and whatnot are only needed if the compressed data is actually decoded, and in this case the decoder should be extracting this information from whatever headers the format uses.
On a related subject, it might also be useful to define the channel disposition when there is more than one. Mono and stereo can go by with the classical default, but as soon as there is more channels it is really unclear. And imho such info could still be usefull with 1 or 2 channels. Something like the position of each channel in polar coordinate (2D or 3D?) should be enouth.
I agree What about that LFE channel thing?
I was thinking about simply setting the distance to 0, however a flag for "non-directional" channels might be better.
This is wrong, LFE is not about direction but about the type of speaker. LFE stands for "Low-frequency effects". If id move a other random speaker at disatnce 0 and the LFE one out and switch channels it wont sound correct ...
And where do we put this info, The stream header seems the logic place if you ask me ...
I agree, this is essential information for proper presentation it definitly belong there.
Good, now we just need to agree on some half sane way to store it. for(i=0; i<num_channels; i++){ x_position s y_position s z_position s channel_flags v }
CHANNEL_FLAG_LFE 1
seems ok?
I'm not convinced this is the right way to go. Consider a recording made with several directional microphones in the same location. Using spherical coordinates could be a solution. Whatever the coordinate system, the location and orientation of the listener must be specified, even if there is only one logical choice. -- Måns Rullgård mans@mansr.com

On Tue, Feb 12, 2008 at 07:17:07PM +0000, Måns Rullgård wrote:
Michael Niedermayer <michaelni@gmx.at> writes:
On Tue, Feb 12, 2008 at 07:37:53PM +0100, Alban Bedel wrote:
On Tue, 12 Feb 2008 17:57:03 +0100 Michael Niedermayer <michaelni@gmx.at> wrote:
On Tue, Feb 12, 2008 at 05:47:13PM +0100, Alban Bedel wrote:
On Tue, 12 Feb 2008 16:00:10 +0100 (CET) michael <subversion@mplayerhq.hu> wrote:
Modified: docs/nutissues.txt ============================================================================== --- docs/nutissues.txt (original) +++ docs/nutissues.txt Tue Feb 12 16:00:09 2008 @@ -162,3 +162,8 @@ How do we identify the interleaving A. fourcc B. extradata
I would vote for this with a single fourcc for pcm and a single fourcc for raw video. Having infos about the data format packed in the fourcc is ugly and useless. That just lead to inflexible lookup tables and the like.
Instead we should just define the format in a way similar to what mp_image provide for video (colorspace, packed or not, shift used for the subsampled planes, etc). That would allow implementations simply supporting all definable format, instead of a selection of what happened to be commonly used formats at the time the implementation was written.
The key points here are that * colorspace/shift for subsampled planes, etc is not specific to RAW, its more like sample_rate or width/height
Sure, but when a "real" codec is used, it's the decoder business to tell the app what output format it will use. NUT can provide infos about the internal format used by the codec,
Only very few codecs have headers which store informations about things like shift for subsampled planes. Thus if this information is desired it has to come from the container more often than not. If its not desired then we also dont need it for raw IMHO.
With compressed video, the decoder informs the caller of the pixel format. With raw video, this information must come from the container, one way or other.
Yes, I agree for pixel format. But the decoder often does not know the fine details. Like as mentioned "shift for subsampled plane" or the precisse definition of YUV or if it uses full luma range or not. MPEG stores these yes, but for example huffyuv does not. So it would make some sense if this information could be stored for non raw as well. [...]
On a related subject, it might also be useful to define the channel disposition when there is more than one. Mono and stereo can go by with the classical default, but as soon as there is more channels it is really unclear. And imho such info could still be usefull with 1 or 2 channels. Something like the position of each channel in polar coordinate (2D or 3D?) should be enouth.
I agree What about that LFE channel thing?
I was thinking about simply setting the distance to 0, however a flag for "non-directional" channels might be better.
This is wrong, LFE is not about direction but about the type of speaker. LFE stands for "Low-frequency effects". If id move a other random speaker at disatnce 0 and the LFE one out and switch channels it wont sound correct ...
And where do we put this info, The stream header seems the logic place if you ask me ...
I agree, this is essential information for proper presentation it definitly belong there.
Good, now we just need to agree on some half sane way to store it. for(i=0; i<num_channels; i++){ x_position s y_position s z_position s channel_flags v }
CHANNEL_FLAG_LFE 1
seems ok?
I'm not convinced this is the right way to go. Consider a recording made with several directional microphones in the same location. Using spherical coordinates could be a solution.
The above was intended to specify the location of the speakers not microphones. And spherical coordinates would just drop the distance, thats the same as setting the distance to 1 and storing that as xyz. Actually the main reason why i didnt use spherical is that with integers theres a precission to decide on or you end up with rationals. And this somehow starts looking messy ...
Whatever the coordinate system, the location and orientation of the listener must be specified, even if there is only one logical choice.
of course right_position s forward_position s up_position s And "the listener is at (0,0,0), (1,0,0) is right, (0,1,0) is forward, (0,0,1) is up" [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB No human being will ever know the Truth, for even if they happen to say it by chance, they would not even known they had done so. -- Xenophanes

Michael Niedermayer <michaelni@gmx.at> writes:
On Tue, Feb 12, 2008 at 07:17:07PM +0000, Måns Rullgård wrote:
Michael Niedermayer <michaelni@gmx.at> writes:
On Tue, Feb 12, 2008 at 07:37:53PM +0100, Alban Bedel wrote:
On Tue, 12 Feb 2008 17:57:03 +0100 Michael Niedermayer <michaelni@gmx.at> wrote:
On Tue, Feb 12, 2008 at 05:47:13PM +0100, Alban Bedel wrote:
On Tue, 12 Feb 2008 16:00:10 +0100 (CET) michael <subversion@mplayerhq.hu> wrote:
> Modified: docs/nutissues.txt > ============================================================================== > --- docs/nutissues.txt (original) > +++ docs/nutissues.txt Tue Feb 12 16:00:09 2008 > @@ -162,3 +162,8 @@ How do we identify the interleaving > A. fourcc > B. extradata
I would vote for this with a single fourcc for pcm and a single fourcc for raw video. Having infos about the data format packed in the fourcc is ugly and useless. That just lead to inflexible lookup tables and the like.
Instead we should just define the format in a way similar to what mp_image provide for video (colorspace, packed or not, shift used for the subsampled planes, etc). That would allow implementations simply supporting all definable format, instead of a selection of what happened to be commonly used formats at the time the implementation was written.
The key points here are that * colorspace/shift for subsampled planes, etc is not specific to RAW, its more like sample_rate or width/height
Sure, but when a "real" codec is used, it's the decoder business to tell the app what output format it will use. NUT can provide infos about the internal format used by the codec,
Only very few codecs have headers which store informations about things like shift for subsampled planes. Thus if this information is desired it has to come from the container more often than not. If its not desired then we also dont need it for raw IMHO.
With compressed video, the decoder informs the caller of the pixel format. With raw video, this information must come from the container, one way or other.
Yes, I agree for pixel format. But the decoder often does not know the fine details. Like as mentioned "shift for subsampled plane" or the precisse definition of YUV or if it uses full luma range or not. MPEG stores these yes, but for example huffyuv does not. So it would make some sense if this information could be stored for non raw as well.
Point taken, and I agree being able to transmit this information could be useful. Using extradata is obviously out of the question, which leaves either stream headers or info packets.
On a related subject, it might also be useful to define the channel disposition when there is more than one. Mono and stereo can go by with the classical default, but as soon as there is more channels it is really unclear. And imho such info could still be usefull with 1 or 2 channels. Something like the position of each channel in polar coordinate (2D or 3D?) should be enouth.
I agree What about that LFE channel thing?
I was thinking about simply setting the distance to 0, however a flag for "non-directional" channels might be better.
This is wrong, LFE is not about direction but about the type of speaker. LFE stands for "Low-frequency effects". If id move a other random speaker at disatnce 0 and the LFE one out and switch channels it wont sound correct ...
And where do we put this info, The stream header seems the logic place if you ask me ...
I agree, this is essential information for proper presentation it definitly belong there.
Good, now we just need to agree on some half sane way to store it. for(i=0; i<num_channels; i++){ x_position s y_position s z_position s channel_flags v }
CHANNEL_FLAG_LFE 1
seems ok?
I'm not convinced this is the right way to go. Consider a recording made with several directional microphones in the same location. Using spherical coordinates could be a solution.
The above was intended to specify the location of the speakers not microphones.
I'm having a hard time imagining a player moving my speakers around depending on the file being played.
And spherical coordinates would just drop the distance, thats the same as setting the distance to 1 and storing that as xyz.
Spherical coordinates without radius needs only two fields.
Actually the main reason why i didnt use spherical is that with integers theres a precission to decide on or you end up with rationals. And this somehow starts looking messy ...
I don't see any fundamental difference. If restricted to integer coordinates, an arbitrary point can be described only with a certain precision, regardless of coordinate system.
Whatever the coordinate system, the location and orientation of the listener must be specified, even if there is only one logical choice.
of course right_position s forward_position s up_position s
And "the listener is at (0,0,0), (1,0,0) is right, (0,1,0) is forward, (0,0,1) is up"
You're forgetting the measurement unit, i.e. metres, feet, etc. -- Måns Rullgård mans@mansr.com

On Tue, Feb 12, 2008 at 08:24:01PM +0000, Måns Rullgård wrote:
Michael Niedermayer <michaelni@gmx.at> writes:
On Tue, Feb 12, 2008 at 07:17:07PM +0000, Måns Rullgård wrote:
Michael Niedermayer <michaelni@gmx.at> writes:
On Tue, Feb 12, 2008 at 07:37:53PM +0100, Alban Bedel wrote:
On Tue, 12 Feb 2008 17:57:03 +0100 Michael Niedermayer <michaelni@gmx.at> wrote:
On Tue, Feb 12, 2008 at 05:47:13PM +0100, Alban Bedel wrote: > On Tue, 12 Feb 2008 16:00:10 +0100 (CET) > michael <subversion@mplayerhq.hu> wrote: > > > Modified: docs/nutissues.txt > > ============================================================================== > > --- docs/nutissues.txt (original) > > +++ docs/nutissues.txt Tue Feb 12 16:00:09 2008 > > @@ -162,3 +162,8 @@ How do we identify the interleaving > > A. fourcc > > B. extradata > > I would vote for this with a single fourcc for pcm and a single > fourcc for raw video. Having infos about the data format packed in > the fourcc is ugly and useless. That just lead to inflexible lookup > tables and the like.
> Instead we should just define the format in a way similar to what > mp_image provide for video (colorspace, packed or not, shift used > for the subsampled planes, etc). That would allow implementations > simply supporting all definable format, instead of a selection of > what happened to be commonly used formats at the time the > implementation was written.
The key points here are that * colorspace/shift for subsampled planes, etc is not specific to RAW, its more like sample_rate or width/height
Sure, but when a "real" codec is used, it's the decoder business to tell the app what output format it will use. NUT can provide infos about the internal format used by the codec,
Only very few codecs have headers which store informations about things like shift for subsampled planes. Thus if this information is desired it has to come from the container more often than not. If its not desired then we also dont need it for raw IMHO.
With compressed video, the decoder informs the caller of the pixel format. With raw video, this information must come from the container, one way or other.
Yes, I agree for pixel format. But the decoder often does not know the fine details. Like as mentioned "shift for subsampled plane" or the precisse definition of YUV or if it uses full luma range or not. MPEG stores these yes, but for example huffyuv does not. So it would make some sense if this information could be stored for non raw as well.
Point taken, and I agree being able to transmit this information could be useful. Using extradata is obviously out of the question, which leaves either stream headers or info packets.
And looking at the stream headers, there is colorspace_type which ive apparently half forgotten ... Does anyone mind if i add chroma_x/y_pos there as well? rich?
> On a related subject, it might also be useful to define the channel > disposition when there is more than one. Mono and stereo can go by > with the classical default, but as soon as there is more channels > it is really unclear. And imho such info could still be usefull > with 1 or 2 channels. Something like the position of each channel > in polar coordinate (2D or 3D?) should be enouth.
I agree What about that LFE channel thing?
I was thinking about simply setting the distance to 0, however a flag for "non-directional" channels might be better.
This is wrong, LFE is not about direction but about the type of speaker. LFE stands for "Low-frequency effects". If id move a other random speaker at disatnce 0 and the LFE one out and switch channels it wont sound correct ...
And where do we put this info, The stream header seems the logic place if you ask me ...
I agree, this is essential information for proper presentation it definitly belong there.
Good, now we just need to agree on some half sane way to store it. for(i=0; i<num_channels; i++){ x_position s y_position s z_position s channel_flags v }
CHANNEL_FLAG_LFE 1
seems ok?
I'm not convinced this is the right way to go. Consider a recording made with several directional microphones in the same location. Using spherical coordinates could be a solution.
The above was intended to specify the location of the speakers not microphones.
I'm having a hard time imagining a player moving my speakers around depending on the file being played.
And spherical coordinates would just drop the distance, thats the same as setting the distance to 1 and storing that as xyz.
Spherical coordinates without radius needs only two fields.
True, but that gets tricky with integers and precission.
Actually the main reason why i didnt use spherical is that with integers theres a precission to decide on or you end up with rationals. And this somehow starts looking messy ...
I don't see any fundamental difference. If restricted to integer coordinates, an arbitrary point can be described only with a certain precision, regardless of coordinate system.
True but if you map the points to a sphere, then x,y,z gives you arbitrary precisson on the surface of the sphere while with spherical coordinates this needs some additional "tricks". Thus x,y,z give you arbitrary directional precission at quite low complexity.
Whatever the coordinate system, the location and orientation of the listener must be specified, even if there is only one logical choice.
of course right_position s forward_position s up_position s
And "the listener is at (0,0,0), (1,0,0) is right, (0,1,0) is forward, (0,0,1) is up"
You're forgetting the measurement unit, i.e. metres, feet, etc.
Hmm, I was thinking that only x/y, x/z that is the direction would matter. If theres some sense in also storing distance then we would need a 4th variable to specifiy the precission like: (x/p, y/p, z/p) meter We can surely do this if someone thinks this is usefull. [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB When you are offended at any man's fault, turn to yourself and study your own failings. Then you will forget your anger. -- Epictetus

Michael Niedermayer <michaelni@gmx.at> writes:
> > On a related subject, it might also be useful to define > > the channel disposition when there is more than one. Mono > > and stereo can go by with the classical default, but as > > soon as there is more channels it is really unclear. And > > imho such info could still be usefull with 1 or 2 > > channels. Something like the position of each channel in > > polar coordinate (2D or 3D?) should be enouth. > > I agree > What about that LFE channel thing?
I was thinking about simply setting the distance to 0, however a flag for "non-directional" channels might be better.
This is wrong, LFE is not about direction but about the type of speaker. LFE stands for "Low-frequency effects". If id move a other random speaker at disatnce 0 and the LFE one out and switch channels it wont sound correct ...
> And where do we put this info, The stream header seems the logic > place if you ask me ...
I agree, this is essential information for proper presentation it definitly belong there.
Good, now we just need to agree on some half sane way to store it. for(i=0; i<num_channels; i++){ x_position s y_position s z_position s channel_flags v }
CHANNEL_FLAG_LFE 1
seems ok?
I'm not convinced this is the right way to go. Consider a recording made with several directional microphones in the same location. Using spherical coordinates could be a solution.
The above was intended to specify the location of the speakers not microphones.
I'm having a hard time imagining a player moving my speakers around depending on the file being played.
And spherical coordinates would just drop the distance, thats the same as setting the distance to 1 and storing that as xyz.
Spherical coordinates without radius needs only two fields.
True, but that gets tricky with integers and precission.
Use rational numbers. That's essentially what you suggest below anyway.
Actually the main reason why i didnt use spherical is that with integers theres a precission to decide on or you end up with rationals. And this somehow starts looking messy ...
I don't see any fundamental difference. If restricted to integer coordinates, an arbitrary point can be described only with a certain precision, regardless of coordinate system.
True but if you map the points to a sphere, then x,y,z gives you arbitrary precisson on the surface of the sphere while with spherical coordinates this needs some additional "tricks". Thus x,y,z give you arbitrary directional precission at quite low complexity.
I didn't realise you were only interested in direction. Not that I know what you'd use the distance for, or how.
Whatever the coordinate system, the location and orientation of the listener must be specified, even if there is only one logical choice.
of course right_position s forward_position s up_position s
And "the listener is at (0,0,0), (1,0,0) is right, (0,1,0) is forward, (0,0,1) is up"
You're forgetting the measurement unit, i.e. metres, feet, etc.
Hmm, I was thinking that only x/y, x/z that is the direction would matter. If theres some sense in also storing distance then we would need a 4th variable to specifiy the precission like: (x/p, y/p, z/p) meter
We can surely do this if someone thinks this is usefull.
I don't think it's useful, but there's no telling what people who measure the copper purity of their speaker cables might believe. -- Måns Rullgård mans@mansr.com

On Tue, 12 Feb 2008 22:20:37 +0000 Måns Rullgård <mans@mansr.com> wrote:
Whatever the coordinate system, the location and orientation of the listener must be specified, even if there is only one logical choice.
of course right_position s forward_position s up_position s
And "the listener is at (0,0,0), (1,0,0) is right, (0,1,0) is forward, (0,0,1) is up"
You're forgetting the measurement unit, i.e. metres, feet, etc.
Hmm, I was thinking that only x/y, x/z that is the direction would matter. If theres some sense in also storing distance then we would need a 4th variable to specifiy the precission like: (x/p, y/p, z/p) meter
We can surely do this if someone thinks this is usefull.
I don't think it's useful, but there's no telling what people who measure the copper purity of their speaker cables might believe.
Too true :) However I doubt that real world units would be useful here. Even if one want to make some adjustement to better match the playback system, simply having the distances relative to one of the speaker should be enouth. Albeu

On Wed, Feb 13, 2008 at 12:29:48AM +0100, Alban Bedel wrote:
On Tue, 12 Feb 2008 22:20:37 +0000 Måns Rullgård <mans@mansr.com> wrote:
Whatever the coordinate system, the location and orientation of the listener must be specified, even if there is only one logical choice.
of course right_position s forward_position s up_position s
And "the listener is at (0,0,0), (1,0,0) is right, (0,1,0) is forward, (0,0,1) is up"
You're forgetting the measurement unit, i.e. metres, feet, etc.
Hmm, I was thinking that only x/y, x/z that is the direction would matter. If theres some sense in also storing distance then we would need a 4th variable to specifiy the precission like: (x/p, y/p, z/p) meter
We can surely do this if someone thinks this is usefull.
I don't think it's useful, but there's no telling what people who measure the copper purity of their speaker cables might believe.
Too true :) However I doubt that real world units would be useful here. Even if one want to make some adjustement to better match the playback system, simply having the distances relative to one of the speaker should be enouth.
Yes but the extra variable doesnt hurt that much, so heres a patch with it. Comments? rich? I can also add a flag for the case that positions arent known, so this is skiped. We can also place this stuff in an info packet but as its essential for playback of at least some codecs, raw being one, and usefull for others which do not allow precisse positions to be stored it seems the stream headers might be better. Also info packets would have a much higher overhead to store this. [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB There will always be a question for which you do not know the correct awnser.

On Wed, Feb 13, 2008 at 12:42:12AM +0100, Michael Niedermayer wrote:
Yes but the extra variable doesnt hurt that much, so heres a patch with it.
Comments? rich? I can also add a flag for the case that positions arent known, so this is skiped. We can also place this stuff in an info packet but as its essential for playback of at least some codecs, raw being one, and usefull for others which do not allow precisse positions to be stored it seems the stream headers might be better. Also info packets would have a much higher overhead to store this.
I'm against the useless physical units and denominator. Just the ratios of the positions to one another matter; the whole thing is scale-invariant for all practical purposes. Otherwise I don't see anything wrong. I also mildly like polar/spherical coordinates better, but if you want the speakers to be able to have different relative differences from the listener, then it's probably more of a mess, so just stick with rectangular. Rich

On Tue, Feb 12, 2008 at 08:46:12PM -0500, Rich Felker wrote:
On Wed, Feb 13, 2008 at 12:42:12AM +0100, Michael Niedermayer wrote:
Yes but the extra variable doesnt hurt that much, so heres a patch with it.
Comments? rich? I can also add a flag for the case that positions arent known, so this is skiped. We can also place this stuff in an info packet but as its essential for playback of at least some codecs, raw being one, and usefull for others which do not allow precisse positions to be stored it seems the stream headers might be better. Also info packets would have a much higher overhead to store this.
I'm against the useless physical units and denominator. Just the ratios of the positions to one another matter; the whole thing is scale-invariant for all practical purposes.
Well, it is not scale-invariant, yeah the world sucks ... The speed of sound is not infinite and more distant speakers will have their signal hit the listener later. That causes a phase shift and could change a 2*sin() to 0*sin(). So a system of a speaker at 1m and one at 2m will would sound different from one with a speaker at 1km and 2km (with the volume turned up sufficiently) Iam not saying that there is any need or sense to store this nor that i would know what a decoder would do with that information, but i do know copper purity counters will prefer NUT over other formats if the distance is stored.
Otherwise I don't see anything wrong.
good, what about the flag to skip the stuff = unknown pos? Do we want one if not what should be muxer store if it doesnt know the pos? [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Thouse who are best at talking, realize last or never when they are wrong.

On Wed, Feb 13, 2008 at 04:48:42AM +0100, Michael Niedermayer wrote:
I'm against the useless physical units and denominator. Just the ratios of the positions to one another matter; the whole thing is scale-invariant for all practical purposes.
Well, it is not scale-invariant, yeah the world sucks ... The speed of sound is not infinite and more distant speakers will have their signal hit the listener later. That causes a phase shift and could change a 2*sin() to 0*sin().
In the range of sane distances, is this really an issue? I'm thinking anywhere from headphones to theaters. Somehow I suspect the speed of sound and the frequencies involved make it irrelevant but I didn't work out the math.
So a system of a speaker at 1m and one at 2m will would sound different from one with a speaker at 1km and 2km (with the volume turned up sufficiently)
WOW I want those speakers! :)
Iam not saying that there is any need or sense to store this nor that i would know what a decoder would do with that information, but i do know copper purity counters will prefer NUT over other formats if the distance is stored.
Well I would like to see some mathematical/physical reason that it's potentially useful rather than the opinions of people who pay for solid-gold *digital* audio cables to give their sound "more body"...
Otherwise I don't see anything wrong.
good, what about the flag to skip the stuff = unknown pos? Do we want one if not what should be muxer store if it doesnt know the pos?
I suspect skipping pos should not be legal for >2 channels.. For 2 channels or fewer, making it optional would be nice. Rich

Rich Felker <dalias@aerifal.cx> writes:
On Wed, Feb 13, 2008 at 04:48:42AM +0100, Michael Niedermayer wrote:
I'm against the useless physical units and denominator. Just the ratios of the positions to one another matter; the whole thing is scale-invariant for all practical purposes.
Well, it is not scale-invariant, yeah the world sucks ... The speed of sound is not infinite and more distant speakers will have their signal hit the listener later. That causes a phase shift and could change a 2*sin() to 0*sin().
In the range of sane distances, is this really an issue? I'm thinking anywhere from headphones to theaters. Somehow I suspect the speed of sound and the frequencies involved make it irrelevant but I didn't work out the math.
Audible wavelengths range roughly from 20m to 0.02m. If there is coherence between the channels, interference effects are readily observed moving around a room with two speakers. Any half-decent surround sound system has adjustable delays per channel to allow compensating for speaker placement.
So a system of a speaker at 1m and one at 2m will would sound different from one with a speaker at 1km and 2km (with the volume turned up sufficiently)
WOW I want those speakers! :)
Iam not saying that there is any need or sense to store this nor that i would know what a decoder would do with that information, but i do know copper purity counters will prefer NUT over other formats if the distance is stored.
Well I would like to see some mathematical/physical reason that it's potentially useful rather than the opinions of people who pay for solid-gold *digital* audio cables to give their sound "more body"...
Agree. -- Måns Rullgård mans@mansr.com

On Tue, 12 Feb 2008 23:32:56 -0500 Rich Felker <dalias@aerifal.cx> wrote:
Otherwise I don't see anything wrong.
good, what about the flag to skip the stuff = unknown pos? Do we want one if not what should be muxer store if it doesnt know the pos?
I suspect skipping pos should not be legal for >2 channels.. For 2 channels or fewer, making it optional would be nice.
imho it should only be optional for mono. Otherwise, even assuming the simplest stereo setup, we will need something else to keep LRLR... apart from RLRL... And having 2 field define the same thing (with potentialy conflicting values) is no good idea. Albeu

On Tue, Feb 12, 2008 at 11:32:56PM -0500, Rich Felker wrote: [...]
Iam not saying that there is any need or sense to store this nor that i would know what a decoder would do with that information, but i do know copper purity counters will prefer NUT over other formats if the distance is stored.
Well I would like to see some mathematical/physical reason that it's potentially useful rather than the opinions of people who pay for solid-gold *digital* audio cables to give their sound "more body"...
Id say liquid he cooled superconductive cables are better (for digital of course). Anyway, technical reason: If the player knows the reference locations and actual locations it can adjust the delays to avoid unwanted interference. And no i dont know how well it would work in practice. Though with extreem cases like 1km difference it could correct the resulting A/V desync at least.
Otherwise I don't see anything wrong.
good, what about the flag to skip the stuff = unknown pos? Do we want one if not what should be muxer store if it doesnt know the pos?
I suspect skipping pos should not be legal for >2 channels.. For 2 channels or fewer, making it optional would be nice.
I dont think its too usefull for 2 ch, a default of (-1,0,0) (1,0,0) would be easy for the muxer to store. And similar for 1 ch ... But if a muxer has to store 5ch audio, figuring out the channel placement could be trickier. Iam not saying iam against it, actually maybe it is the correct thing to do, and having this optional would just lead to it being missing most of the time ... [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB I am the wisest man alive, for I know one thing, and that is that I know nothing. -- Socrates

Michael Niedermayer wrote:
On Tue, Feb 12, 2008 at 11:32:56PM -0500, Rich Felker wrote: [...]
Iam not saying that there is any need or sense to store this nor that i would know what a decoder would do with that information, but i do know copper purity counters will prefer NUT over other formats if the distance is stored.
Well I would like to see some mathematical/physical reason that it's potentially useful rather than the opinions of people who pay for solid-gold *digital* audio cables to give their sound "more body"...
Id say liquid he cooled superconductive cables are better (for digital of course).
Anyway, technical reason: If the player knows the reference locations and actual locations it can adjust the delays to avoid unwanted interference. And no i dont know how well it would work in practice. Though with extreem cases like 1km difference it could correct the resulting A/V desync at least.
What if the screen is on the moon? Then the video will have a delay of a second or so. -- Måns Rullgård mans@mansr.com

On Wed, Feb 13, 2008 at 01:38:13PM -0000, Måns Rullgård wrote:
Michael Niedermayer wrote:
On Tue, Feb 12, 2008 at 11:32:56PM -0500, Rich Felker wrote: [...]
Iam not saying that there is any need or sense to store this nor that i would know what a decoder would do with that information, but i do know copper purity counters will prefer NUT over other formats if the distance is stored.
Well I would like to see some mathematical/physical reason that it's potentially useful rather than the opinions of people who pay for solid-gold *digital* audio cables to give their sound "more body"...
Id say liquid he cooled superconductive cables are better (for digital of course).
Anyway, technical reason: If the player knows the reference locations and actual locations it can adjust the delays to avoid unwanted interference. And no i dont know how well it would work in practice. Though with extreem cases like 1km difference it could correct the resulting A/V desync at least.
What if the screen is on the moon? Then the video will have a delay of a second or so.
The idealized decoder model could assume instantaneos video and so the real decoder could compensate. [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Frequently ignored awnser#1 FFmpeg bugs should be sent to our bugtracker. User questions about the command line tools should be sent to the ffmpeg-user ML. And questions about how to use libav* should be sent to the libav-user ML.

On Wed, 13 Feb 2008 13:46:14 +0100 Michael Niedermayer <michaelni@gmx.at> wrote:
On Tue, Feb 12, 2008 at 11:32:56PM -0500, Rich Felker wrote: [...]
Iam not saying that there is any need or sense to store this nor that i would know what a decoder would do with that information, but i do know copper purity counters will prefer NUT over other formats if the distance is stored.
Well I would like to see some mathematical/physical reason that it's potentially useful rather than the opinions of people who pay for solid-gold *digital* audio cables to give their sound "more body"...
Id say liquid he cooled superconductive cables are better (for digital of course).
Anyway, technical reason: If the player knows the reference locations and actual locations it can adjust the delays to avoid unwanted interference. And no i dont know how well it would work in practice.
I think the pb is that the best interpretation (ie. absolute in real world units or relative) depend on how the sound was mastered. If you take todays movies and albums you probably want a relative interpretation, so that most speakers just get the unadultered sound. On the other hand if someone is doing a recording which is supposed to be heard on a well defined setup, and one want to get the closest possible experience to what was intended, real world units are needed. But I suspect even more is needed for a really good simulation. The echo characteritic of the room, temperature, wind, etc So perhaps we should just have relative coordinate for now, and add a note that we are open to add a standard info packet for "listening environement simulation" if ppl from this field are intersted.
Though with extreem cases like 1km difference it could correct the resulting A/V desync at least.
:)
Otherwise I don't see anything wrong.
good, what about the flag to skip the stuff = unknown pos? Do we want one if not what should be muxer store if it doesnt know the pos?
I suspect skipping pos should not be legal for >2 channels.. For 2 channels or fewer, making it optional would be nice.
I dont think its too usefull for 2 ch, a default of (-1,0,0) (1,0,0) would be easy for the muxer to store. And similar for 1 ch ... But if a muxer has to store 5ch audio, figuring out the channel placement could be trickier. Iam not saying iam against it, actually maybe it is the correct thing to do, and having this optional would just lead to it being missing most of the time ...
I agree, making it mendatory is the best option. But them we should perhaps define some defaults for typical configurations. Albeu

On Wed, Feb 13, 2008 at 04:46:09PM +0100, Alban Bedel wrote:
I think the pb is that the best interpretation (ie. absolute in real world units or relative) depend on how the sound was mastered.
If you take todays movies and albums you probably want a relative interpretation, so that most speakers just get the unadultered sound.
On the other hand if someone is doing a recording which is supposed to be heard on a well defined setup, and one want to get the closest possible experience to what was intended, real world units are needed. But I suspect even more is needed for a really good simulation. The echo characteritic of the room, temperature, wind, etc
So perhaps we should just have relative coordinate for now, and add a note that we are open to add a standard info packet for "listening environement simulation" if ppl from this field are intersted.
One other idea: what about storing the numerator values always, and: - if denominator is 0, the numerators are just relative values - if denominator is nonzero, the values are in physical units (being american, i vote for yards! =P)
I dont think its too usefull for 2 ch, a default of (-1,0,0) (1,0,0) would be easy for the muxer to store. And similar for 1 ch ... But if a muxer has to store 5ch audio, figuring out the channel placement could be trickier. Iam not saying iam against it, actually maybe it is the correct thing to do, and having this optional would just lead to it being missing most of the time ...
I agree, making it mendatory is the best option. But them we should perhaps define some defaults for typical configurations.
Yes, that'd be nice. Also it would be nice if the reference demuxer could "downgrade" the position information to standard 'dumb' channel naming for callers (left, right, front, left-rear, right-rear, whatever). Rich

On Wed, Feb 13, 2008 at 01:20:52PM -0500, Rich Felker wrote:
On Wed, Feb 13, 2008 at 04:46:09PM +0100, Alban Bedel wrote:
I think the pb is that the best interpretation (ie. absolute in real world units or relative) depend on how the sound was mastered.
If you take todays movies and albums you probably want a relative interpretation, so that most speakers just get the unadultered sound.
On the other hand if someone is doing a recording which is supposed to be heard on a well defined setup, and one want to get the closest possible experience to what was intended, real world units are needed. But I suspect even more is needed for a really good simulation. The echo characteritic of the room, temperature, wind, etc
So perhaps we should just have relative coordinate for now, and add a note that we are open to add a standard info packet for "listening environement simulation" if ppl from this field are intersted.
One other idea: what about storing the numerator values always, and: - if denominator is 0, the numerators are just relative values
- if denominator is nonzero, the values are in physical units (being american, i vote for yards! =P)
no chance! you can pick from one of these: mickey, yottaparsec, metric exainch, square root femtobarn, or if you really insist light picosecond, zeptoparsec or siriometer [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Frequently ignored awnser#1 FFmpeg bugs should be sent to our bugtracker. User questions about the command line tools should be sent to the ffmpeg-user ML. And questions about how to use libav* should be sent to the libav-user ML.

Michael Niedermayer <michaelni@gmx.at> writes:
On Wed, Feb 13, 2008 at 01:20:52PM -0500, Rich Felker wrote:
On Wed, Feb 13, 2008 at 04:46:09PM +0100, Alban Bedel wrote:
I think the pb is that the best interpretation (ie. absolute in real world units or relative) depend on how the sound was mastered.
If you take todays movies and albums you probably want a relative interpretation, so that most speakers just get the unadultered sound.
On the other hand if someone is doing a recording which is supposed to be heard on a well defined setup, and one want to get the closest possible experience to what was intended, real world units are needed. But I suspect even more is needed for a really good simulation. The echo characteritic of the room, temperature, wind, etc
So perhaps we should just have relative coordinate for now, and add a note that we are open to add a standard info packet for "listening environement simulation" if ppl from this field are intersted.
One other idea: what about storing the numerator values always, and: - if denominator is 0, the numerators are just relative values
- if denominator is nonzero, the values are in physical units (being american, i vote for yards! =P)
no chance! you can pick from one of these: mickey, yottaparsec, metric exainch, square root femtobarn, or if you really insist light picosecond, zeptoparsec or siriometer
No furlongs?!? -- Måns Rullgård mans@mansr.com

On Tue, Feb 12, 2008 at 10:38:28PM +0100, Michael Niedermayer wrote:
And looking at the stream headers, there is colorspace_type which ive apparently half forgotten ...
Does anyone mind if i add chroma_x/y_pos there as well? rich?
I'm mildly against it. The differences are really minor and most formats specify which sampling offset is in use as part of the codec spec. For those that don't, info key/value pairs would probably be sufficient rather than cluttering the header (and easier for processes to set/query in a general-purpose demuxer/muxer api). Rich

On Tue, Feb 12, 2008 at 08:42:57PM -0500, Rich Felker wrote:
On Tue, Feb 12, 2008 at 10:38:28PM +0100, Michael Niedermayer wrote:
And looking at the stream headers, there is colorspace_type which ive apparently half forgotten ...
Does anyone mind if i add chroma_x/y_pos there as well? rich?
I'm mildly against it. The differences are really minor and most formats specify which sampling offset is in use as part of the codec
Ok, then ill drop this for now. [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Asymptotically faster algorithms should always be preferred if you have asymptotical amounts of data
participants (5)
-
Alban Bedel
-
michael
-
Michael Niedermayer
-
Måns Rullgård
-
Rich Felker