[Ffmpeg-devel] channel ordering and downmixing

Mon Apr 9 04:02:59 CEST 2007

Michael Niedermayer wrote:
> Hi
> 
> On Sun, Apr 08, 2007 at 01:15:21AM -0400, Justin Ruggles wrote:
>>The attached patch is not fully-functional, but gives an idea of how it
>>might work.  The example used is encoding PCM wav to raw AC3.  The
>>user-level parts (ffmpeg.c and ffplay.c) are not implemented yet, so the
>>patch doesn't work at this point.
>>
>>Also, I don't really know what appropriate channel positions should be.
> 
> 
> just guess and add a note that they are just guessed ...
> 
>> Anyone have any ideas or references I might refer to for this?

Now I've run into quite a mess, caused by none other than Microsoft.
The two examples I've been using as to how to implement a multi-channel
API are WAVE_FORMAT_EXTENSIBLE and CAFF.  Well, it seems that Microsoft
made a big mess, then Apple followed suit by copying Microsoft but
without giving any usage guidelines.

What I'm referring to is the channel layout description of side speakers
vs. rear speakers in the standard 5.1 home theater system.  Here is a
quote from Microsoft's "Audio Driver Support for Home Theater Speaker
Configurations" document at
http://www.microsoft.com/whdc/device/audio/SpkrConfig.mspx

[start quote]

According to the bit definitions in Figure 3, the channel mask for
recording the 5.1 stream shown on the left side of Figure 5 should be
0x60F, which assigns the six channels to the following speaker
positions: FL, FR, FC, LFE, SL, and SR. (This is the side-speaker 5.1
configuration discussed earlier.) In fact, the channel mask for the 5.1
stream is 0x3F rather than 0x60F for reasons that were mentioned
previously and will now be explained in detail.

In earlier versions of Windows (Windows 98/Me, Windows 2000, Windows XP
with SP1, and Windows Server 2003), the interpretation of the channel
mask 0x3F is that it assigns the six channels in the 5.1 format to the
following speaker positions: FL, FR, FC, LFE, BL, and BR. (This is the
back-speaker 5.1 configuration.)  However, the interpretation in Windows
XP with SP2, Windows Server 2003 with SP1, and Windows Vista is
different: by convention, the 5.1 format with the channel mask 0x3F is
interpreted to mean the side-speaker 5.1 configuration instead of the
back-speaker 5.1 configuration.

Interpreting the channel mask in this manner eliminates the requirement
to introduce a second 5.1-channel format descriptor to distinguish the
side-speaker 5.1 configuration from the back-speaker 5.1 configuration.
These two configurations are so similar that typical users might have
difficulty distinguishing between them.  Although having only a single
5.1-channel format descriptor avoids confusing users, it does require
hardware vendors to remember to interpret the 0x3F channel mask to mean
that channels 5 and 6 are assigned to the SL and SR speaker positions
instead of the BL and BR positions. In return for having to remember
this special-case interpretation of the channel mask for a 5.1 stream,
vendors can spare users the difficulty of distinguishing between two
very similar 5.1-channel format descriptors.

[end quote]

Okay, so at least now I have a clear answer on all this "which channel
mask do I use" nonsense I've been dealing with for the past year, but
now I've run into a question.  Do we keep with Microsoft's convention on
channel mask or break rank to make it technically correct?  What Apple
did with CAFF was to keep with Microsoft's convention, but renamed
Microsoft's "back" channels to "surround" and renamed the "side"
channels to "surround direct".  I think they've skirted the real issue
though by not giving specific recommendations in the spec as to what
mask values or channel labels the standard 5.1 channel setup should use.
 In practice they seem to not use the channel mask or channel labels,
but instead use an enum of pre-defined layouts with only vague
information as to which actual speakers are being referred to.

I think we should probably go a different route by making a clear
distinction between side and rear "surround" speakers, especially since
we are defining explicit speaker positions.  So my inclination at this
point would be to use the WAVE_FORMAT_EXTENSIBLE naming scheme, but
actually implement it properly rather than using Microsoft's
"special-case" scenario crap for side vs. back speakers.  The downside
would be having confused users due to Microsoft's admittedly incorrect
use of back speaker mask values for side speakers in 5.1.

I am open to any suggestions from those who have a more extensive
background in this kind of stuff.

-Justin