[FFmpeg-devel] [PATCH] unscaled float 2 int conversion

Benjamin Larsson banan
Fri Aug 1 09:35:40 CEST 2008


Michael Niedermayer wrote:
> On Thu, Jul 31, 2008 at 08:45:23PM +0300, Ivan Kalvachev wrote:
>   
>> On 5/19/08, Michael Niedermayer <michaelni at gmx.at> wrote:
>>     
>>> On Mon, May 19, 2008 at 06:22:11PM +0200, Benjamin Larsson wrote:
>>>       
>>>> Michael Niedermayer wrote:
>>>>         
>>>>>> And can you rerun the benchmarks on your P3 but not prescale the float
>>>>>> buffer. Ie change to this and.
>>>>>>
>>>>>> tmpa[i] = in[i]* (1.0/32768) + 385;
>>>>>>
>>>>>> The reason I'm wondering is that sometimes it's not trivial to get the
>>>>>> scaling for free and then you would have to do it during the loop to
>>>>>> add
>>>>>> the bias. I suspect that it is slower on platforms where it matter.
>>>>>>
>>>>>>             
>>>>> 228651 dezicycles in conv_cast, 16256 runs, 128 skips
>>>>> 108574 dezicycles in conv_lrint, 16321 runs, 63 skips
>>>>> 63418 dezicycles in conv_x87_asm, 16329 runs, 55 skips
>>>>> 51975 dezicycles in conv_x87_asm_ex, 16349 runs, 35 skips
>>>>> 54081 dezicycles in conv_bias, 16351 runs, 33 skips
>>>>>
>>>>> that is with hand tuned conv_x87_asm_ex and gcc generated conv_bias
>>>>> if i just hand tune the fmul/fadd loop a little with the integer code
>>>>> left
>>>>> as gcc generated it i get
>>>>> 46308 dezicycles in conv_bias, 16336 runs, 48 skips
>>>>>
>>>>>           
>>>> This result puzzled me somewhat until I found out that the benchmark
>>>> tests had this kind of code for all methods instead of only for the bias
>>>> code:
>>>>
>>>> for(i=0; i<SIZE; i++)
>>>>   tmpa[i] = in[i];
>>>>
>>>> In the case when you can't get scaling for free that loop should be
>>>> omitted. It would be truly bizarre if it is was always faster to do
>>>> float2int by using the bias trick.
>>>>
>>>> So pretty please can you retry without that line also ?
>>>>         
>>> 114005 dezicycles in conv_lrint, 16352 runs, 32 skips
>>> 42600 dezicycles in conv_x87_asm, 16355 runs, 29 skips
>>> 31168 dezicycles in conv_x87_asm_ex, 16357 runs, 27 skips
>>>
>>> So to summarize
>>> if you can scale the floats for free -> bias is fastest
>>> if you cannot scale the floats for free -> bias is fastest
>>> if you do not have any previous loop accessing the floats, thus you need an
>>> additional pass to scale the floats -> conv_x87_asm_ex is maybe faster
>>>
>>> Its just "maybe" because the conv_bias code is not fully hand tuned and
>>> you assume that the floats would be betweem -32768 32767 instead of
>>> -1.0 .. 1.0 which is a assumtation which might not hold.
>>>
>>> all only on P3/P2/PPro with no SIMD of course
>>>       
>> After some rant on irc, I got to look at the float_to_int16() function.
>>
>> One of the rants is the range of the function [-1;1],
>> so samples have to be rescaled before used.
>>
>> The function can work just fine with input range [-32768;32767] ,
>> if instead of 385 is used bias of 385*32768, this changes only
>> the exponent and keeps the fraction bits at the same place.
>> The only modification to the function is the constant used for clipping:
>>
>> @@ -3948,7 +3948,7 @@
>>  static av_always_inline int float_to_int16_one(const float *src){
>>      int_fast32_t tmp = *(const int32_t*)src;
>>      if(tmp & 0xf0000){
>> -        tmp = (0x43c0ffff - tmp)>>31;
>> +        tmp = (0x4b40ffff - tmp)>>31;
>>
>> I hope, this somehow helps.
>>     
>
> Its interresting how we missed this for so long ...
> This way all the non constant scale factors can be droped. You dont happen
> to also have an idea on how to get rid of the +385*C ? :)
>
> [...]

For reference I found this page 
http://www.df.lth.se/~john_e/gems/gem0042.html, while it doesn't do 
float to int16 it could be good for float to int24 and int32.

MvH
Benjamin Larsson






More information about the ffmpeg-devel mailing list