[NUT-devel] [nut]: r613 - docs/nutissues.txt
Måns Rullgård
mans at mansr.com
Tue Feb 12 21:24:01 CET 2008
Michael Niedermayer <michaelni at gmx.at> writes:
> On Tue, Feb 12, 2008 at 07:17:07PM +0000, Måns Rullgård wrote:
>> Michael Niedermayer <michaelni at gmx.at> writes:
>>
>> > On Tue, Feb 12, 2008 at 07:37:53PM +0100, Alban Bedel wrote:
>> >> On Tue, 12 Feb 2008 17:57:03 +0100
>> >> Michael Niedermayer <michaelni at gmx.at> wrote:
>> >>
>> >> > On Tue, Feb 12, 2008 at 05:47:13PM +0100, Alban Bedel wrote:
>> >> > > On Tue, 12 Feb 2008 16:00:10 +0100 (CET)
>> >> > > michael <subversion at mplayerhq.hu> wrote:
>> >> > >
>> >> > > > Modified: docs/nutissues.txt
>> >> > > > ==============================================================================
>> >> > > > --- docs/nutissues.txt (original)
>> >> > > > +++ docs/nutissues.txt Tue Feb 12 16:00:09 2008
>> >> > > > @@ -162,3 +162,8 @@ How do we identify the interleaving
>> >> > > > A. fourcc
>> >> > > > B. extradata
>> >> > >
>> >> > > I would vote for this with a single fourcc for pcm and a single
>> >> > > fourcc for raw video. Having infos about the data format packed in
>> >> > > the fourcc is ugly and useless. That just lead to inflexible lookup
>> >> > > tables and the like.
>> >> >
>> >> > > Instead we should just define the format in a way similar to what
>> >> > > mp_image provide for video (colorspace, packed or not, shift used
>> >> > > for the subsampled planes, etc). That would allow implementations
>> >> > > simply supporting all definable format, instead of a selection of
>> >> > > what happened to be commonly used formats at the time the
>> >> > > implementation was written.
>> >> >
>> >> > The key points here are that
>> >> > * colorspace/shift for subsampled planes, etc is not specific to RAW,
>> >> > its more like sample_rate or width/height
>> >>
>> >> Sure, but when a "real" codec is used, it's the decoder business to tell
>> >> the app what output format it will use. NUT can provide infos about the
>> >> internal format used by the codec,
>> >
>> > Only very few codecs have headers which store informations about
>> > things like shift for subsampled planes. Thus if this information
>> > is desired it has to come from the container more often than
>> > not. If its not desired then we also dont need it for raw IMHO.
>>
>> With compressed video, the decoder informs the caller of the pixel
>> format. With raw video, this information must come from the
>> container, one way or other.
>
> Yes, I agree for pixel format.
> But the decoder often does not know the fine details. Like as
> mentioned "shift for subsampled plane" or the precisse definition of
> YUV or if it uses full luma range or not. MPEG stores these yes, but
> for example huffyuv does not. So it would make some sense if this
> information could be stored for non raw as well.
Point taken, and I agree being able to transmit this information could
be useful. Using extradata is obviously out of the question, which
leaves either stream headers or info packets.
>> >> > > On a related subject, it might also be useful to define the channel
>> >> > > disposition when there is more than one. Mono and stereo can go by
>> >> > > with the classical default, but as soon as there is more channels
>> >> > > it is really unclear. And imho such info could still be usefull
>> >> > > with 1 or 2 channels. Something like the position of each channel
>> >> > > in polar coordinate (2D or 3D?) should be enouth.
>> >> >
>> >> > I agree
>> >> > What about that LFE channel thing?
>> >>
>> >> I was thinking about simply setting the distance to 0, however a flag
>> >> for "non-directional" channels might be better.
>> >
>> > This is wrong, LFE is not about direction but about the type of speaker.
>> > LFE stands for "Low-frequency effects".
>> > If id move a other random speaker at disatnce 0 and the LFE one out and
>> > switch channels it wont sound correct ...
>> >
>> >>
>> >> > And where do we put this info, The stream header seems the logic
>> >> > place if you ask me ...
>> >>
>> >> I agree, this is essential information for proper presentation it
>> >> definitly belong there.
>> >
>> > Good, now we just need to agree on some half sane way to store it.
>> > for(i=0; i<num_channels; i++){
>> > x_position s
>> > y_position s
>> > z_position s
>> > channel_flags v
>> > }
>> >
>> > CHANNEL_FLAG_LFE 1
>> >
>> > seems ok?
>>
>> I'm not convinced this is the right way to go. Consider a recording
>> made with several directional microphones in the same location. Using
>> spherical coordinates could be a solution.
>
> The above was intended to specify the location of the speakers not
> microphones.
I'm having a hard time imagining a player moving my speakers around
depending on the file being played.
> And spherical coordinates would just drop the distance, thats the same
> as setting the distance to 1 and storing that as xyz.
Spherical coordinates without radius needs only two fields.
> Actually the main reason why i didnt use spherical is that with integers
> theres a precission to decide on or you end up with rationals. And this
> somehow starts looking messy ...
I don't see any fundamental difference. If restricted to integer
coordinates, an arbitrary point can be described only with a certain
precision, regardless of coordinate system.
>> Whatever the coordinate system, the location and orientation of the
>> listener must be specified, even if there is only one logical choice.
>
> of course
> right_position s
> forward_position s
> up_position s
>
> And
> "the listener is at (0,0,0), (1,0,0) is right, (0,1,0) is forward,
> (0,0,1) is up"
You're forgetting the measurement unit, i.e. metres, feet, etc.
--
Måns Rullgård
mans at mansr.com
More information about the NUT-devel
mailing list