[FFmpeg-devel] [PATCH] unscaled float 2 int conversion

Sun May 18 16:59:53 CEST 2008

Michael Niedermayer wrote:
> On Sun, May 18, 2008 at 12:12:04AM +0200, Benjamin Larsson wrote:
>> [...]
>>>> so how should we go forward from this when we work on 
>>>> implementing a new audio api. The codecs should output samples in their 
>>>> native format, that is what I think most of us agree on. But what is the 
>>>> native format for a codec outputting samples in float when running in 
>>>> simd mode and the same when running in non simd mode ?
>>> SAMPLE_FMT_FLT
>>> and
>>> SAMPLE_FMT_FLT_BIAS_385
>> The reason I keep bitching about this is that SAMPLE_FMT_FLT_BIAS_385
>> output is cumbersome to use if you want to add a filter after you have
>> decoded a codec frame.
> 
> I do not understand this problem. Each filter (if we ever do have audio
> filters) supports specific formats and convertion filters would be
> insterted as needed.
> Only the convertion filter needs to care about SAMPLE_FMT_FLT_BIAS_385.
> 
> 
>> What I propose then is that we only use the bias trick when we are
>> outputting 16bit samples directly after the decoder.
> 
> Of course, thats the whole idea behind SAMPLE_FMT_FLT_BIAS_385.
> Also the second last filter might choose to output SAMPLE_FMT_FLT_BIAS_385
> if it knows that the next filter prefers it and converts to 16bit.
> 

How about this: (Some of which has been discussed before)

Add

CODEC_CAP_AFRAME
================
For audio-codecs this means that the codec will output/receive the data
in a (struct AVframe). Initially no codecs will have this capability.
In the end, all audio codecs should be converted to this.

Decoders with CODEC_CAP_AFRAME will output samples in their
native format (as opposed to int16_t) and thus, set avctx->sample_fmt
to that format.

Encoders will set avctx->sample_fmt during init and will expect that
samples are delivered in that format by the caller.

For float output the native range is -1.0 to +1.0.
8, 16 and 32 bits integers are obvious.
I assume that 24bit would be stored as -2^23 to +2^23 in an int32_t.

CODEC_CAP_SCALE_BIAS
====================
This capability signals that the decoder honor the (two new) fields
avctx->sample_scale and avctx->sample_bias (both float I think)
in an efficient manner. (e.g. codecs could keep a local copy
of sample_scale, and if it detected to have changed, the codec
can recompute its coefficients, or whatever it uses).
Thus, this could also be used to change volume of the output
during playback.

I'm not sure if it is worth to implement scaling and bias on the
encoder side though.

...

The current versions of avcodec_decode_audio() can then be adapted to
use these capabilities for conversion to int16_t and thus preserve
both the external API and speed.
Also, we'll get rid of some float2int code duplication.

avcodec_decode_audio() just needs access to a dsputil context somehow.
Perhaps a (dsputil *) could be pointed to via avctx as well.
Ugly indeed, better suggestions are most welcome :)

Once this is done we should expose the native sample formats directly
via avcodec_{en,de}code_audio3() or similar function.

The step after that would be to start arranging for audio filters
and such things.

I'm trying to come up with a solution that allows us to divide
the work in smaller pieces and take one step at a time.
Cause I think that's the only way we can make progress on this subject.