[FFmpeg-devel] Audio conversion and floating-point codecs

Sat Jul 10 10:37:54 CEST 2010

Hi,

> On Sat, Jul 10, 2010 at 08:23 PM, Peter Ross wrote: 
> On Tue, Jul 06, 2010 at 03:13:26PM +0100, M?ns Rullg?rd wrote:
> > Peter Ross <pross at xvid.org> writes:
> > 
> > > On Sat, May 15, 2010 at 08:17:51PM +0100, M?ns Rullg?rd wrote:
> > >> There is a long-standing desire from some to make the 
> > >> floating-point decoders output float samples instead of 
> converting 
> > >> to int16 internally, and I agree with the reasons for this.  
> > >> However, making this change hastily will make decoding orders of 
> > >> magnitude slower on many CPUs.  The reason is that when 
> a decoder 
> > >> outputs float samples, the fast asm code for 
> float-to-int conversion is not used.
> > >> 
> > >> In order to change the output format of these decoders without 
> > >> impacting performance, we must first make a few 
> improvements to the 
> > >> avcodec API and to the generic audio format conversion code.
> > > [...]
> > >
> > >> - The decoders should output planar audio instead of 
> interleaved for
> > >>   multichannel streams.  This probably means introducing
> > >>   avcodec_decode_audio4() with an AVFrame output.
> > >
> > > Q: does it make sense to expand the existing AVFrame 
> structure, or 
> > > define a new struct specific to audio?
> > >
> > > #define FF_MAX_CHANNELS  8
> > > struct AVAudioFrame {
> > >     uint16_t *data[FF_MAX_CHANNELS]; };
> > 
> > I've posed the same question myself, without finding a good answer.
> > Some codecs support a huge number of channels.  I can say for sure,
> 
> Second contenious point:
> At present, the user allocates the samples buffer that is 
> handed of to avcodec_decode_audioN().
> 
> IMHO this is sloppy. Just look at how ffmpeg.c guesses the 
> buffer size.
> The alternative is to have the decoder do it, e.g. by calling
> avctx->get_buffer() with the number the samples/channels to be output.
> Thoughts?
> 
> > however, that uint16_t is the wrong data type to use here.
> 
> Oops. I intended int16_t.

Regarding audio sample format, wouldn't an approach be nice where the user (the one using libav...) can define the native audio sample format from a supported list (i.e. uint8_t, int16_t, int32_t, float, ...) as the default sample format that all audio functions will then use? Like a C++ template that can be instatiated with uint8_t, int16_t, etc.

I know this is a bunch of work, because it concerns so many parts in the code. But if thinking about adding more support than sole int16_t (which is a good idea I think and high time), all the possibilities should be on the table.

Just my 0.02?
Axel