[FFmpeg-devel] [PATCH] unscaled float 2 int conversion

Thu Jul 31 19:45:23 CEST 2008

On 5/19/08, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Mon, May 19, 2008 at 06:22:11PM +0200, Benjamin Larsson wrote:
>> Michael Niedermayer wrote:
>> >> And can you rerun the benchmarks on your P3 but not prescale the float
>> >> buffer. Ie change to this and.
>> >>
>> >> tmpa[i] = in[i]* (1.0/32768) + 385;
>> >>
>> >> The reason I'm wondering is that sometimes it's not trivial to get the
>> >> scaling for free and then you would have to do it during the loop to
>> >> add
>> >> the bias. I suspect that it is slower on platforms where it matter.
>> >>
>> >
>> > 228651 dezicycles in conv_cast, 16256 runs, 128 skips
>> > 108574 dezicycles in conv_lrint, 16321 runs, 63 skips
>> > 63418 dezicycles in conv_x87_asm, 16329 runs, 55 skips
>> > 51975 dezicycles in conv_x87_asm_ex, 16349 runs, 35 skips
>> > 54081 dezicycles in conv_bias, 16351 runs, 33 skips
>> >
>> > that is with hand tuned conv_x87_asm_ex and gcc generated conv_bias
>> > if i just hand tune the fmul/fadd loop a little with the integer code
>> > left
>> > as gcc generated it i get
>> > 46308 dezicycles in conv_bias, 16336 runs, 48 skips
>> >
>>
>>
>> This result puzzled me somewhat until I found out that the benchmark
>> tests had this kind of code for all methods instead of only for the bias
>> code:
>>
>> for(i=0; i<SIZE; i++)
>>   tmpa[i] = in[i];
>>
>> In the case when you can't get scaling for free that loop should be
>> omitted. It would be truly bizarre if it is was always faster to do
>> float2int by using the bias trick.
>>
>> So pretty please can you retry without that line also ?
>
> 114005 dezicycles in conv_lrint, 16352 runs, 32 skips
> 42600 dezicycles in conv_x87_asm, 16355 runs, 29 skips
> 31168 dezicycles in conv_x87_asm_ex, 16357 runs, 27 skips
>
> So to summarize
> if you can scale the floats for free -> bias is fastest
> if you cannot scale the floats for free -> bias is fastest
> if you do not have any previous loop accessing the floats, thus you need an
> additional pass to scale the floats -> conv_x87_asm_ex is maybe faster
>
> Its just "maybe" because the conv_bias code is not fully hand tuned and
> you assume that the floats would be betweem -32768 32767 instead of
> -1.0 .. 1.0 which is a assumtation which might not hold.
>
> all only on P3/P2/PPro with no SIMD of course

After some rant on irc, I got to look at the float_to_int16() function.

One of the rants is the range of the function [-1;1],
so samples have to be rescaled before used.

The function can work just fine with input range [-32768;32767] ,
if instead of 385 is used bias of 385*32768, this changes only
the exponent and keeps the fraction bits at the same place.
The only modification to the function is the constant used for clipping:

@@ -3948,7 +3948,7 @@
 static av_always_inline int float_to_int16_one(const float *src){
     int_fast32_t tmp = *(const int32_t*)src;
     if(tmp & 0xf0000){
-        tmp = (0x43c0ffff - tmp)>>31;
+        tmp = (0x4b40ffff - tmp)>>31;

I hope, this somehow helps.