[NUT-devel] [nut]: r613 - docs/nutissues.txt

Michael Niedermayer michaelni at gmx.at
Tue Feb 12 22:38:28 CET 2008


On Tue, Feb 12, 2008 at 08:24:01PM +0000, Måns Rullgård wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
> 
> > On Tue, Feb 12, 2008 at 07:17:07PM +0000, Måns Rullgård wrote:
> >> Michael Niedermayer <michaelni at gmx.at> writes:
> >> 
> >> > On Tue, Feb 12, 2008 at 07:37:53PM +0100, Alban Bedel wrote:
> >> >> On Tue, 12 Feb 2008 17:57:03 +0100
> >> >> Michael Niedermayer <michaelni at gmx.at> wrote:
> >> >> 
> >> >> > On Tue, Feb 12, 2008 at 05:47:13PM +0100, Alban Bedel wrote:
> >> >> > > On Tue, 12 Feb 2008 16:00:10 +0100 (CET)
> >> >> > > michael <subversion at mplayerhq.hu> wrote:
> >> >> > > 
> >> >> > > > Modified: docs/nutissues.txt
> >> >> > > > ==============================================================================
> >> >> > > > --- docs/nutissues.txt	(original)
> >> >> > > > +++ docs/nutissues.txt	Tue Feb 12 16:00:09 2008
> >> >> > > > @@ -162,3 +162,8 @@ How do we identify the interleaving
> >> >> > > >  A. fourcc
> >> >> > > >  B. extradata
> >> >> > > 
> >> >> > > I would vote for this with a single fourcc for pcm and a single
> >> >> > > fourcc for raw video. Having infos about the data format packed in
> >> >> > > the fourcc is ugly and useless. That just lead to inflexible lookup
> >> >> > > tables and the like. 
> >> >> > 
> >> >> > > Instead we should just define the format in a way similar to what
> >> >> > > mp_image provide for video (colorspace, packed or not, shift used
> >> >> > > for the subsampled planes, etc). That would allow implementations
> >> >> > > simply supporting all definable format, instead of a selection of
> >> >> > > what happened to be commonly used formats at the time the
> >> >> > > implementation was written.
> >> >> > 
> >> >> > The key points here are that
> >> >> > * colorspace/shift for subsampled planes, etc is not specific to RAW,
> >> >> > its more like sample_rate or width/height
> >> >> 
> >> >> Sure, but when a "real" codec is used, it's the decoder business to tell
> >> >> the app what output format it will use. NUT can provide infos about the
> >> >> internal format used by the codec, 
> >> >
> >> > Only very few codecs have headers which store informations about
> >> > things like shift for subsampled planes. Thus if this information
> >> > is desired it has to come from the container more often than
> >> > not. If its not desired then we also dont need it for raw IMHO.
> >> 
> >> With compressed video, the decoder informs the caller of the pixel
> >> format.  With raw video, this information must come from the
> >> container, one way or other.
> >
> > Yes, I agree for pixel format.
> > But the decoder often does not know the fine details. Like as
> > mentioned "shift for subsampled plane" or the precisse definition of
> > YUV or if it uses full luma range or not. MPEG stores these yes, but
> > for example huffyuv does not. So it would make some sense if this
> > information could be stored for non raw as well.
> 
> Point taken, and I agree being able to transmit this information could
> be useful.  Using extradata is obviously out of the question, which
> leaves either stream headers or info packets.

And looking at the stream headers, there is
colorspace_type
which ive apparently half forgotten ...

Does anyone mind if i add chroma_x/y_pos there as well? rich?


> 
> >> >> > > On a related subject, it might also be useful to define the channel
> >> >> > > disposition when there is more than one. Mono and stereo can go by
> >> >> > > with the classical default, but as soon as there is more channels
> >> >> > > it is really unclear. And imho such info could still be usefull
> >> >> > > with 1 or 2 channels. Something like the position of each channel
> >> >> > > in polar coordinate (2D or 3D?) should be enouth.
> >> >> > 
> >> >> > I agree
> >> >> > What about that LFE channel thing?
> >> >> 
> >> >> I was thinking about simply setting the distance to 0, however a flag
> >> >> for "non-directional" channels might be better.
> >> >
> >> > This is wrong, LFE is not about direction but about the type of speaker.
> >> > LFE stands for "Low-frequency effects".
> >> > If id move a other random speaker at disatnce 0 and the LFE one out and
> >> > switch channels it wont sound correct ...
> >> >
> >> >> 
> >> >> > And where do we put this info, The stream header seems the logic
> >> >> > place if you ask me ...
> >> >> 
> >> >> I agree, this is essential information for proper presentation it
> >> >> definitly belong there.
> >> >
> >> > Good, now we just need to agree on some half sane way to store it.
> >> > for(i=0; i<num_channels; i++){
> >> >     x_position                  s
> >> >     y_position                  s
> >> >     z_position                  s
> >> >     channel_flags               v
> >> > }
> >> >
> >> > CHANNEL_FLAG_LFE             1
> >> >
> >> > seems ok?
> >> 
> >> I'm not convinced this is the right way to go.  Consider a recording
> >> made with several directional microphones in the same location.  Using
> >> spherical coordinates could be a solution.
> >
> > The above was intended to specify the location of the speakers not
> > microphones.
> 
> I'm having a hard time imagining a player moving my speakers around
> depending on the file being played.
> 
> > And spherical coordinates would just drop the distance, thats the same
> > as setting the distance to 1 and storing that as xyz.
> 
> Spherical coordinates without radius needs only two fields.

True, but that gets tricky with integers and precission.


> 
> > Actually the main reason why i didnt use spherical is that with integers
> > theres a precission to decide on or you end up with rationals. And this
> > somehow starts looking messy ...
> 
> I don't see any fundamental difference.  If restricted to integer
> coordinates, an arbitrary point can be described only with a certain
> precision, regardless of coordinate system.

True but if you map the points to a sphere, then x,y,z gives you arbitrary
precisson on the surface of the sphere while with spherical coordinates
this needs some additional "tricks".
Thus x,y,z give you arbitrary directional precission at quite low complexity.


> 
> >> Whatever the coordinate system, the location and orientation of the
> >> listener must be specified, even if there is only one logical choice.
> >
> > of course
> > right_position               s
> > forward_position             s
> > up_position                  s
> >
> > And
> > "the listener is at (0,0,0), (1,0,0) is right, (0,1,0) is forward,
> > (0,0,1) is up"
> 
> You're forgetting the measurement unit, i.e. metres, feet, etc.

Hmm, I was thinking that only x/y, x/z that is the direction would matter.
If theres some sense in also storing distance then we would need a 4th
variable to specifiy the precission like:
(x/p, y/p, z/p) meter

We can surely do this if someone thinks this is usefull.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/nut-devel/attachments/20080212/04892807/attachment.pgp>


More information about the NUT-devel mailing list