Re: [NUT-devel] [nut]: r613 - docs/nutissues.txt

12 Feb 2008


      Michael Niedermayer <michaelni@gmx.at> writes:
...
On Tue, Feb 12, 2008 at 07:17:07PM +0000, Måns Rullgård wrote:
...
Michael Niedermayer <michaelni@gmx.at> writes:
...
On Tue, Feb 12, 2008 at 07:37:53PM +0100, Alban Bedel wrote:
...
On Tue, 12 Feb 2008 17:57:03 +0100
Michael Niedermayer <michaelni@gmx.at> wrote:
...
On Tue, Feb 12, 2008 at 05:47:13PM +0100, Alban Bedel wrote:
...
On Tue, 12 Feb 2008 16:00:10 +0100 (CET)
michael <subversion@mplayerhq.hu> wrote:
> Modified: docs/nutissues.txt
> ==============================================================================
> --- docs/nutissues.txt	(original)
> +++ docs/nutissues.txt	Tue Feb 12 16:00:09 2008
> @@ -162,3 +162,8 @@ How do we identify the interleaving
>  A. fourcc
>  B. extradata
I would vote for this with a single fourcc for pcm and a single
fourcc for raw video. Having infos about the data format packed in
the fourcc is ugly and useless. That just lead to inflexible lookup
tables and the like.
...
Instead we should just define the format in a way similar to what
mp_image provide for video (colorspace, packed or not, shift used
for the subsampled planes, etc). That would allow implementations
simply supporting all definable format, instead of a selection of
what happened to be commonly used formats at the time the
implementation was written.
The key points here are that
* colorspace/shift for subsampled planes, etc is not specific to RAW,
its more like sample_rate or width/height
Sure, but when a "real" codec is used, it's the decoder business to tell
the app what output format it will use. NUT can provide infos about the
internal format used by the codec,
Only very few codecs have headers which store informations about
things like shift for subsampled planes. Thus if this information
is desired it has to come from the container more often than
not. If its not desired then we also dont need it for raw IMHO.
With compressed video, the decoder informs the caller of the pixel
format.  With raw video, this information must come from the
container, one way or other.
Yes, I agree for pixel format.
But the decoder often does not know the fine details. Like as
mentioned "shift for subsampled plane" or the precisse definition of
YUV or if it uses full luma range or not. MPEG stores these yes, but
for example huffyuv does not. So it would make some sense if this
information could be stored for non raw as well.
Point taken, and I agree being able to transmit this information could
be useful.  Using extradata is obviously out of the question, which
leaves either stream headers or info packets.
...
...
...
...
...
...
On a related subject, it might also be useful to define the channel
disposition when there is more than one. Mono and stereo can go by
with the classical default, but as soon as there is more channels
it is really unclear. And imho such info could still be usefull
with 1 or 2 channels. Something like the position of each channel
in polar coordinate (2D or 3D?) should be enouth.
I agree
What about that LFE channel thing?
I was thinking about simply setting the distance to 0, however a flag
for "non-directional" channels might be better.
This is wrong, LFE is not about direction but about the type of speaker.
LFE stands for "Low-frequency effects".
If id move a other random speaker at disatnce 0 and the LFE one out and
switch channels it wont sound correct ...
...
...
And where do we put this info, The stream header seems the logic
place if you ask me ...
I agree, this is essential information for proper presentation it
definitly belong there.
Good, now we just need to agree on some half sane way to store it.
for(i=0; i<num_channels; i++){
    x_position                  s
    y_position                  s
    z_position                  s
    channel_flags               v
}
CHANNEL_FLAG_LFE             1
seems ok?
I'm not convinced this is the right way to go.  Consider a recording
made with several directional microphones in the same location.  Using
spherical coordinates could be a solution.
The above was intended to specify the location of the speakers not
microphones.
I'm having a hard time imagining a player moving my speakers around
depending on the file being played.
...
And spherical coordinates would just drop the distance, thats the same
as setting the distance to 1 and storing that as xyz.
Spherical coordinates without radius needs only two fields.
...
Actually the main reason why i didnt use spherical is that with integers
theres a precission to decide on or you end up with rationals. And this
somehow starts looking messy ...
I don't see any fundamental difference.  If restricted to integer
coordinates, an arbitrary point can be described only with a certain
precision, regardless of coordinate system.
...
...
Whatever the coordinate system, the location and orientation of the
listener must be specified, even if there is only one logical choice.
of course
right_position               s
forward_position             s
up_position                  s
And
"the listener is at (0,0,0), (1,0,0) is right, (0,1,0) is forward,
(0,0,1) is up"
You're forgetting the measurement unit, i.e. metres, feet, etc.

-- 
Måns Rullgård
mans@mansr.com