[FFmpeg-devel] [PATCH] unscaled float 2 int conversion
Thu May 15 23:56:49 CEST 2008
On Thu, May 15, 2008 at 11:17:40PM +0200, Michael Niedermayer wrote:
> On Thu, May 15, 2008 at 09:14:15PM +0200, Benjamin Larsson wrote:
> > Michael Niedermayer wrote:
> > >> Well when I tried the last time I did't get it to work, there was some
> > >> overlap issue that wasn't trivial to sort out.
> > >
> > > You just add 384 or what it was after the windowing/overlap.
> > >
> > Just to be clear, this bias scale thing is about not having to use the
> > fstp fpu call or whatever it is called on other cpus. To perform it you
> > first scale down your samples to -1 and 1. This scaling operation is
> > most often performed for free by scaling a suitable table somewhere.
> > Then you add 384 so you can cast the float value directly to an integer.
> > So you trade a float add against fstp which must have been faster on
> > some cpu (or else they wouldn't have used it).
> > In FFmpeg we also have 3dnow, sse and altivec code that can do float to
> > int16 conversion. I think we can agree that the simd code is faster then
> > the bias trick on all processors that supports the simd code. Then we
> > are left with Intel cpus before P3, the Motorola G3 and various other
> > cpus with only fpus and no simd unit. I'm pretty sure that this trick is
> > the best when we are dealing with P2 cpus and lower but I'm not sure it
> > is for the G3.
> > So then we come to the matter of performance, you want benchmarks to
> > justify changing or adding a new scaling method. As I don't have access
> > to any machines that doesn't have a simd unit I can't do any usable
> > benchmarks. But I'm quite sure that if I had access it would show that
> > doing the bias trick would be faster. So one could argue that well ok
> > then we keep the code as it is. But my opinion is that we should scrap
> > this anyway, it makes the code complex, it slows down the simd code
> > (very little though) for no good reason, it complicates the development
> > of a proper audio api and filter system. Cpus with slow fpus should use
> > fixed point code instead.
> > So I propose that we start cleaning out this.
> Ohh well, why do i always have to do the work? You could have safed me
> some time by just saying that you wont do the benchmarks.
> PS: yes i dont give a damn what you or anyone else thinks, either
> i see benchmarks or people can go talking to their next wall.
> It would have taken you less time to disable MMX*/SSE* and write
> a benchmark than explaining why its better not to.
2nd try, now it is a P3
gcc-4.3 -O2 -fno-math-errno
221951 dezicycles in conv_cast, 16254 runs, 130 skips
107203 dezicycles in conv_lrint, 16291 runs, 93 skips
103967 dezicycles in conv_bias, 16286 runs, 98 skips
gcc-4.2 -O2 -fno-math-errno -lm
214423 dezicycles in conv_cast, 16250 runs, 134 skips
114627 dezicycles in conv_lrint, 16325 runs, 59 skips
53196 dezicycles in conv_bias, 16334 runs, 50 skips
gcc-4.1 -O2 -fno-math-errno -lm
212703 dezicycles in conv_cast, 16258 runs, 126 skips
111271 dezicycles in conv_lrint, 16318 runs, 66 skips
84831 dezicycles in conv_bias, 16316 runs, 68 skips
gcc-4.0 -O2 -fno-math-errno -lm
215119 dezicycles in conv_cast, 16274 runs, 110 skips
169588 dezicycles in conv_lrint, 16282 runs, 102 skips
53398 dezicycles in conv_bias, 16338 runs, 46 skips
gcc-3.4 -O2 -fno-math-errno -lm
215642 dezicycles in conv_cast, 16221 runs, 163 skips
105947 dezicycles in conv_lrint, 16318 runs, 66 skips
48505 dezicycles in conv_bias, 16338 runs, 46 skips
after a little bit hacking on the code:
65010 dezicycles in conv_lrint, 16321 runs, 63 skips
but this is still quite a but slower
So it seems the bias code is faster on P3(P2/Ppro) cpus
which also means i wont approv its removial unless someone
beats gcc-3.4 -O2 conv_bias on a P3/P2/PPro
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
it is not once nor twice but times without number that the same ideas make
their appearance in the world. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel