[FFmpeg-devel] Audio conversion and floating-point codecs

Tue May 18 00:23:31 CEST 2010

Frank Barchard <fbarchard at google.com> writes:

> On Sat, May 15, 2010 at 12:17 PM, M?ns Rullg?rd <mans at mansr.com> wrote:
>
>> There is a long-standing desire from some to make the floating-point
>> decoders output float samples instead of converting to int16
>> internally, and I agree with the reasons for this.  However, making
>> this change hastily will make decoding orders of magnitude slower on
>> many CPUs.  The reason is that when a decoder outputs float samples,
>> the fast asm code for float-to-int conversion is not used.
>>
>> In order to change the output format of these decoders without
>> impacting performance, we must first make a few improvements to the
>> avcodec API and to the generic audio format conversion code.
>>
>> What we have
>> ------------
>>
>> - Very fast float-to-int16 conversion code in dsputil.  These
>>  functions require input scaled to -32k..32k.
>>
>
> Adding scaling to this code wouldnt slow it down much.

I know what I'm talking about, ...

> Here's a scalar float to int I used for wmapro conversion:
>
> const __m128 kFloatScaler = _mm_set1_ps( 2147483648.0f );static void
> FloatToIntSaturate(float* p) {  __m128 a = _mm_set1_ps(*p);  a =
> _mm_mul_ss(a, kFloatScaler);  *reinterpret_cast<int32*>(p) =
> _mm_cvtss_si32(a);}

... whereas about whoever wrote that POS, I'm not so sure.

>> - The codecs in question all scale the output to the correct range as
>>  part of transforms or filters.  The scaling is thus effectively free.
>>
>> - Generic sample format conversion code (audioconvert.c).  This code
>>  requires float input in the range -1..1.  It does not use any asm
>>  and is thus excruciatingly slow.  Decoding wmapro on Cortex-A8
>>  spends more than 50% of the total time here.
>>
>
> Short term, I would prefer wmapro output int16 and have ffmpeg do it
> efficiently on arm.

Yes, that is the plan.

> If you pass thru float asis to Pulse or kmixer, performance is poor and/or
> you get stutter.
> If you leave conversion to applications, they'll tend to do it poorly,
> especially on less common CPUs.

I know that, which is why we need a decent conversion API.

>> What we need
>> ------------
>>
>> - The libavcodec API needs to be amended such that a specific scaling
>>  can be requested of the decoders.  This should probably be done
>>  similarly to how channel down-mixing is already handled.
>
> Its better to require a consistent -1 to 1.

No, that would be slower.

>> - The decoders should output planar audio instead of interleaved for
>>  multichannel streams.  This probably means introducing
>>  avcodec_decode_audio4() with an AVFrame output.
>
> Planar requires interleaving before it can be played.  Is there a
> compelling advantage?

1. The user might want it like that.
2. The existing float2int16 asm does interleaving more or less for free.
   A separate interleaving pass would definitely be slower.

> Also note that at some point, float video channels would be good.

ROTFL

-- 
M?ns Rullg?rd
mans at mansr.com