[FFmpeg-devel] [PATCH] unscaled float 2 int conversion
Mon May 19 18:50:47 CEST 2008
On Mon, May 19, 2008 at 06:22:11PM +0200, Benjamin Larsson wrote:
> Michael Niedermayer wrote:
> >> And can you rerun the benchmarks on your P3 but not prescale the float
> >> buffer. Ie change to this and.
> >> tmpa[i] = in[i]* (1.0/32768) + 385;
> >> The reason I'm wondering is that sometimes it's not trivial to get the
> >> scaling for free and then you would have to do it during the loop to add
> >> the bias. I suspect that it is slower on platforms where it matter.
> > 228651 dezicycles in conv_cast, 16256 runs, 128 skips
> > 108574 dezicycles in conv_lrint, 16321 runs, 63 skips
> > 63418 dezicycles in conv_x87_asm, 16329 runs, 55 skips
> > 51975 dezicycles in conv_x87_asm_ex, 16349 runs, 35 skips
> > 54081 dezicycles in conv_bias, 16351 runs, 33 skips
> > that is with hand tuned conv_x87_asm_ex and gcc generated conv_bias
> > if i just hand tune the fmul/fadd loop a little with the integer code left
> > as gcc generated it i get
> > 46308 dezicycles in conv_bias, 16336 runs, 48 skips
> This result puzzled me somewhat until I found out that the benchmark
> tests had this kind of code for all methods instead of only for the bias
> for(i=0; i<SIZE; i++)
> tmpa[i] = in[i];
> In the case when you can't get scaling for free that loop should be
> omitted. It would be truly bizarre if it is was always faster to do
> float2int by using the bias trick.
> So pretty please can you retry without that line also ?
114005 dezicycles in conv_lrint, 16352 runs, 32 skips
42600 dezicycles in conv_x87_asm, 16355 runs, 29 skips
31168 dezicycles in conv_x87_asm_ex, 16357 runs, 27 skips
So to summarize
if you can scale the floats for free -> bias is fastest
if you cannot scale the floats for free -> bias is fastest
if you do not have any previous loop accessing the floats, thus you need an
additional pass to scale the floats -> conv_x87_asm_ex is maybe faster
Its just "maybe" because the conv_bias code is not fully hand tuned and
you assume that the floats would be betweem -32768 32767 instead of
-1.0 .. 1.0 which is a assumtation which might not hold.
all only on P3/P2/PPro with no SIMD of course
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Let us carefully observe those good qualities wherein our enemies excel us
and endeavor to excel them, by avoiding what is faulty, and imitating what
is excellent in them. -- Plutarch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel