[FFmpeg-devel] [Issue 664] [PATCH] Fix AAC PNS Scaling
Michael Niedermayer
michaelni
Wed Oct 8 04:30:16 CEST 2008
On Wed, Oct 08, 2008 at 04:45:52AM +0300, Uoti Urpala wrote:
> On Wed, 2008-10-08 at 02:55 +0200, Michael Niedermayer wrote:
> > On Tue, Oct 07, 2008 at 05:23:56PM -0400, Alex Converse wrote:
> > > Attached is a version that explicitly uses a int32_t.
> >
> > i will object to this patch until someone posts a speed comparission
> > between it and ff_sqrt.
>
> I tested a simple loop doing "sum += 1/sqrtf(i)" on core2. As expected,
> "1./ff_sqrt(i)" is the slowest way to calculate that. Standard
> "1./sqrtf(i)" is equally fast with default flags and somewhat faster
> with -ffast-math (and has better accuracy). The code from Alex is about
> twice as fast.
on a Pentium Dual @ 1.73GHz
ff_sqrt() is as expected much faster than sqrtf(), iam rather surprised
about your results, maybe you could post your test code?
942848790 dezicycles in ff_sqrt, 1 runs, 0 skips
820105780 dezicycles in sqrtf, 1 runs, 0 skips
320925930 dezicycles in sqrt alex, 1 runs, 0 skips
689735345 dezicycles in ff_sqrt, 2 runs, 0 skips
740571455 dezicycles in sqrtf, 2 runs, 0 skips
322195770 dezicycles in sqrt alex, 2 runs, 0 skips
562373695 dezicycles in ff_sqrt, 4 runs, 0 skips
700777317 dezicycles in sqrtf, 4 runs, 0 skips
322461587 dezicycles in sqrt alex, 4 runs, 0 skips
498531962 dezicycles in ff_sqrt, 8 runs, 0 skips
681025767 dezicycles in sqrtf, 8 runs, 0 skips
323032856 dezicycles in sqrt alex, 8 runs, 0 skips
467081127 dezicycles in ff_sqrt, 16 runs, 0 skips
671043717 dezicycles in sqrtf, 16 runs, 0 skips
322701851 dezicycles in sqrt alex, 16 runs, 0 skips
450925613 dezicycles in ff_sqrt, 32 runs, 0 skips
666131574 dezicycles in sqrtf, 32 runs, 0 skips
322542049 dezicycles in sqrt alex, 32 runs, 0 skips
443217610 dezicycles in ff_sqrt, 64 runs, 0 skips
664184340 dezicycles in sqrtf, 64 runs, 0 skips
322520999 dezicycles in sqrt alex, 64 runs, 0 skips
3318.429688 3318.004883 3314.035156
one also can see here that alex code is about a factor of 10 less accurate
also one has to keep in mind that these are synthetic tests and we really
should be testing with the AAC code.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "libavutil/common.h"
#include "libavutil/internal.h"
#include "libavutil/log.h"
static float inv_sqrtf(float x) {
union { float f; int i; } pun;
float xhalf = 0.5f*x;
pun.f = x;
pun.i = 0x5f3759df - (pun.i>>1);
x = pun.f;
x = x*(1.5f-xhalf*x*x);
return x;
}
#define N (1000000000)
#undef printf
main(){
int i, j;
float sum0=0;
float sum1=0;
float sum2=0;
for(j=0; j<100; j++){
{START_TIMER
for(i=1; i<N; i+=1000){
sum0 += 1./ff_sqrt(i);
}
STOP_TIMER("ff_sqrt")}
{START_TIMER
for(i=1; i<N; i+=1000){
sum1 += 1./sqrtf(i);
}
STOP_TIMER("sqrtf")}
{START_TIMER
for(i=1; i<N; i+=1000){
sum2 += inv_sqrtf(i);
}
STOP_TIMER("sqrt alex")}
}
printf("%f %f %f\n", sum0, sum1, sum2);
}
with -O3 -ffast-math
---------
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> ... defining _GNU_SOURCE...
For the love of all that is holy, and some that is not, don't do that.
-- Luca & Mans
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081008/f8d1da5a/attachment.pgp>
More information about the ffmpeg-devel
mailing list