[FFmpeg-devel] [Issue 664] [PATCH] Fix AAC PNS Scaling
Uoti Urpala
uoti.urpala
Wed Oct 8 07:40:25 CEST 2008
On Wed, 2008-10-08 at 07:02 +0300, Uoti Urpala wrote:
> 312743900 dezicycles in ff_sqrt, 64 runs, 0 skips
> 130205690 dezicycles in sqrtf, 64 runs, 0 skips
> 327773023 dezicycles in sqrt alex, 64 runs, 0 skips
> 6420.692419 6419.931566 6413.858814
For fun I added the following vectorized intrinsics version using
rsqrtps too:
START_TIMER
typedef float v4sf __attribute__((vector_size(16)));
v4sf sum = {0, 0, 0, 0};
v4sf cnt = {1, 1001, 2001, 3001};
v4sf add = {4000, 4000, 4000, 4000};
for (i = 1; i < N; i += 4000) {
sum += __builtin_ia32_rsqrtps(cnt);
cnt += add;
}
for (i = 0; i < 4; i++)
sum3 += ((float *)&sum)[i];
STOP_TIMER("rsqrtps")}
With this the results are (-ffast-math version for others; makes no
difference for the rsqrtps one):
311880190 dezicycles in ff_sqrt, 64 runs, 0 skips
130455941 dezicycles in sqrtf, 64 runs, 0 skips
326274114 dezicycles in sqrt alex, 64 runs, 0 skips
10004567 dezicycles in rsqrtps, 64 runs, 0 skips
6420.692419 6419.931566 6413.858814 6421.396637
rsqrtps returns an "approximate" value and is between ff_sqrt and
sqrt_alex in accuracy for the sum starting at 1, but it's of course an
order of magnitude faster than anything else.
More information about the ffmpeg-devel
mailing list