[FFmpeg-devel] [Issue 664] [PATCH] Fix AAC PNS Scaling

Tue Oct 7 23:23:56 CEST 2008

On Tue, Oct 7, 2008 at 5:01 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Tue, Oct 07, 2008 at 04:40:20PM -0400, Alex Converse wrote:
>> On Tue, Oct 7, 2008 at 4:27 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > On Tue, Oct 07, 2008 at 03:23:50PM -0400, Alex Converse wrote:
>> >> On Tue, Oct 7, 2008 at 1:01 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> > On Mon, Oct 06, 2008 at 11:30:56PM -0400, Alex Converse wrote:
>> >> >> On Mon, Oct 6, 2008 at 10:20 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> >> > On Mon, Oct 06, 2008 at 09:52:51PM -0400, Alex Converse wrote:
>> >> >> >> On Mon, Oct 6, 2008 at 9:39 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> >> >> > On Mon, Oct 06, 2008 at 08:52:06PM -0400, Alex Converse wrote:
>> >> >> >> >> On Mon, Oct 6, 2008 at 8:22 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> >> >> >> > On Mon, Oct 06, 2008 at 03:46:55PM -0400, Alex Converse wrote:
>> >> >> >> >> >> On Tue, Sep 30, 2008 at 11:25 PM, Alex Converse <alex.converse at gmail.com> wrote:
>> >> >> >> >> >> > Hi,
>> >> >> >> >> >> >
>> >> >> >> >> >> > The attached patch should fix AAC PNS scaling [Issue 664]. It will not
>> >> >> >> >> >> > fix PNS conformance.
>> >> >> >> >> >>
>> >> >> >> >> >> Here's a sightly updated patch (sqrtf instead of sqrt). The current
>> >> >> >> >> >> method of PNS will never conform because sample energy simpl doesn't
>> >> >> >> >> >> converge to it's mean fast enough. The spec explicitly states that PNS
>> >> >> >> >> >> should be normalized per band. Not doing it that way causes PNS-1
>> >> >> >> >> >> conformance to fail for 45 bands.
>> >> >> >> >> >
>> >> >> >> >> > elaborate, what part of the spec says what?
>> >> >> >> >>
>> >> >> >> >> 14496-3:2005/4.6.13.3 p184 (636 of the PDF)
>> >> >> >> >>
>> >> >> >> >> > what is PNS-1 conformance ?
>> >> >> >> >>
>> >> >> >> >> 14496-4:2004/6.6.1.2.2.4 p94 (102 PDF)
>> >> >> >> >> 14496-5/conf_pns folder
>> >> >> >> >
>> >> >> >> > do you happen to have URLs for these?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> > the part that feels a little odd is normalizing random data on arbitrary
>> >> >> >> >> > and artificial bands, this simply makes things non random.
>> >> >> >> >> > This would be most extreemly vissibly with really short bands of 1 or 2
>> >> >> >> >> > coeffs ...
>> >> >> >> >> > another way to see the issue is to take 100 coeffs and split them into
>> >> >> >> >> > 10 bands, if you now normalize litterally these 10 bands then the 100
>> >> >> >> >> > coeffs will no longer be random at all, they will be significantly
>> >> >> >> >> > correlated. This may be inaudible, it may or may not sound better and
>> >> >> >> >> > may or may not be what the spec wants but still it feels somewhat wrong
>> >> >> >> >> > to me ...
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> Ralph Sperschneider from FhG/MPEG spelled it all out:
>> >> >> >> >> http://lists.mpegif.org/pipermail/mp4-tech/2003-June/002358.html
>> >> >> >> >>
>> >> >> >> >> I'm not saying it's a smart way to design a CODEC but it's what MPEG picked.
>> >> >> >> >
>> >> >> >> > yes, so i guess the most sensible solution would be to precalculate
>> >> >> >> > a second of noise normalized to the band sizes and randomly pick from
>> >> >> >> > these.
>> >> >> >> >
>> >> >> >>
>> >> >> >> That sounds messy and overly complex. What's wrong with doing it the
>> >> >> >> way MPEG tells us to?
>> >> >> >
>> >> >> > that is what mpeg tells us to do, they do not mandate any specific way
>> >> >> > to calculate random values. And i do not like doing sqrt() per band ...
>> >> >> >
>> >> >>
>> >> >> One sqrtf() per band isn't that intense. To stick with the current
>> >> >> approach we still need to do a sqrt on the band size. We could even
>> >> >> use one of those fast 1/sqrt algorithms.
>> >> >
>> >> > we do not need to do a sqrt() on the band size, not in the current
>> >> > approuch and not with the other variant. A small LUT will do fine
>> >> > considering the small number of band sizes. And even that is not
>> >> > needed in all cases ...
>> >> >
>> >>
>> >> I'm attaching a version that is functionally correct that does do 1
>> >> sqrtf per band (aka up to 120 per frame).
>> >>
>> >> I'm using the Carmack-Lomont 1/sqrtf based on the following benchmark:
>> >>
>> >> With math.h sqrtf
>> >> alex at Barcelona:~/Projects/ffmpeg/14496-4$ ../ffmpeg/ffmpeg -i
>> >> mpeg4audio-conformance/compressedMp4/al18_48.mp4 -f null -
>> >> FFmpeg version SVN-r15584, Copyright (c) 2000-2008 Fabrice Bellard, et al.
>> >>   configuration: --enable-gpl
>> >>   libavutil     49.11. 0 / 49.11. 0
>> >>   libavcodec    52. 0. 0 / 52. 0. 0
>> >>   libavformat   52.22. 1 / 52.22. 1
>> >>   libavdevice   52. 1. 0 / 52. 1. 0
>> >>   built on Oct  7 2008 14:26:51, gcc: 4.3.2
>> >> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
>> >> 'mpeg4audio-conformance/compressedMp4/al18_48.mp4':
>> >>   Duration: 00:01:00.01, start: 0.000000, bitrate: 67 kb/s
>> >>     Stream #0.0(und): Audio: aac, 48000 Hz, mono, s16
>> >> Output #0, null, to 'pipe:':
>> >>     Stream #0.0(und): Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s
>> >> Stream mapping:
>> >>   Stream #0.0 -> #0.0
>> >> Press [q] to stop encoding
>> >> 253170 dezicycles in sqrtf, 1 runs, 0 skips
>> >> 245160 dezicycles in sqrtf, 2 runs, 0 skips
>> >> 240997 dezicycles in sqrtf, 4 runs, 0 skips
>> >> 238050 dezicycles in sqrtf, 8 runs, 0 skips
>> >> 236553 dezicycles in sqrtf, 16 runs, 0 skips
>> >> 235676 dezicycles in sqrtf, 32 runs, 0 skips
>> >> 234884 dezicycles in sqrtf, 64 runs, 0 skips
>> >> 234129 dezicycles in sqrtf, 128 runs, 0 skips
>> >> 234115 dezicycles in sqrtf, 256 runs, 0 skips
>> >> 234366 dezicycles in sqrtf, 512 runs, 0 skips
>> >> 233892 dezicycles in sqrtf, 1024 runs, 0 skips
>> >> 233624 dezicycles in sqrtf, 2047 runs, 1 skips
>> >> size=      -0kB time=59.99 bitrate=  -0.0kbits/s
>> >> video:0kB audio:5624kB global headers:0kB muxing overhead -100.000382%
>> >>
>> >> With Carmack-Lomont
>> >> alex at Barcelona:~/Projects/ffmpeg/14496-4$ ../ffmpeg/ffmpeg -i
>> >> mpeg4audio-conformance/compressedMp4/al18_48.mp4 -f null -
>> >> FFmpeg version SVN-r15584, Copyright (c) 2000-2008 Fabrice Bellard, et al.
>> >>   configuration: --enable-gpl
>> >>   libavutil     49.11. 0 / 49.11. 0
>> >>   libavcodec    52. 0. 0 / 52. 0. 0
>> >>   libavformat   52.22. 1 / 52.22. 1
>> >>   libavdevice   52. 1. 0 / 52. 1. 0
>> >>   built on Oct  7 2008 14:26:51, gcc: 4.3.2
>> >> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
>> >> 'mpeg4audio-conformance/compressedMp4/al18_48.mp4':
>> >>   Duration: 00:01:00.01, start: 0.000000, bitrate: 67 kb/s
>> >>     Stream #0.0(und): Audio: aac, 48000 Hz, mono, s16
>> >> Output #0, null, to 'pipe:':
>> >>     Stream #0.0(und): Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s
>> >> Stream mapping:
>> >>   Stream #0.0 -> #0.0
>> >> Press [q] to stop encoding
>> >> 197190 dezicycles in sqrtf, 1 runs, 0 skips
>> >> 190665 dezicycles in sqrtf, 2 runs, 0 skips
>> >> 187807 dezicycles in sqrtf, 4 runs, 0 skips
>> >> 185737 dezicycles in sqrtf, 8 runs, 0 skips
>> >> 184651 dezicycles in sqrtf, 16 runs, 0 skips
>> >> 184255 dezicycles in sqrtf, 32 runs, 0 skips
>> >> 183868 dezicycles in sqrtf, 64 runs, 0 skips
>> >> 183976 dezicycles in sqrtf, 128 runs, 0 skips
>> >> 183913 dezicycles in sqrtf, 256 runs, 0 skips
>> >> 184199 dezicycles in sqrtf, 511 runs, 1 skips
>> >> 184037 dezicycles in sqrtf, 1023 runs, 1 skips
>> >> 183925 dezicycles in sqrtf, 2047 runs, 1 skips
>> >> size=      -0kB time=59.99 bitrate=  -0.0kbits/s
>> >> video:0kB audio:5624kB global headers:0kB muxing overhead -100.000382%
>> >>
>> >> Intel(R) Core(TM)2 CPU 6600  @ 2.40GHz
>> >
>> > please try ff_sqrt() as well
>> >
>> >
>> > [...]
>> >> @@ -671,6 +671,17 @@
>> >>      }
>> >>  }
>> >>
>> >> +/** Fast Carmack-Lomont 1/sqrtf(), see Lomont 2003 */
>> >> +static float inv_sqrtf(float x) {
>> >> +    union { float f; int i; } pun;
>> >> +    float xhalf = 0.5f*x;
>> >> +    pun.f = x;
>> >> +    pun.i = 0x5f3759df - (pun.i>>1);
>> >> +    x = pun.f;
>> >> +    x = x*(1.5f-xhalf*x*x);
>> >> +    return x;
>> >> +}
>> >> +
>> >>  /**
>> >>   * Decode spectral data; reference: table 4.50.
>> >>   * Dequantize and scale spectral data; reference: 4.6.3.3.
>> >
>> > that stuff is not portable, it contains assumtations about endianness
>> > and sizes of int and float.
>> >
>>
>> Are you sure it's endian specific?
>
> iam sure it will not work if float and int differ in endianness.
>
>
>>
>> "This note assumes PC architecture (32 bit floats and ints) or
>> similar.In particular the floating point representation is IEEE
>> 754-1985 [7].This C code has been reported to be endian-neutral
>> (supposedly testedit on a Macintosh). I have not verified this. Since
>> the method works on 32 bit numbers it seems (surprisingly)
>> endian-neutral." -Lomont 2003
>>
>> Is there an integer size guaranteed to be sizeof(float)?
>
> int32_t is guranteed to be 32bit and the code will not work
> if float is not 32 bit
>
>
>>
>> Does ffmpeg target any non IEEE-754 machines?
>
> ffmpeg should work on mostly POSIX- mostly ISO C- twos complement machines
> that does not implicate IEEE i think ...
> also some things like some ARM have no FPUs and use software emulation
> i do not know how close they are to IEEE-754, for sw emu it can make
> sense to not conform to IEEE to improve speed ...
>

I was under the impression that arm soft float was 754... or at least
754 enough for these purposes. But I don't have a setup to test it on
right now.
Either way punning from 754 floats is a very useful thing maybe we
should add a check for 754 in configure?

Attached is a version that explicitly uses a int32_t.

--Alex
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aac-pns-scale-per-band-v2.diff
Type: text/x-diff
Size: 2541 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081007/474c1f56/attachment.diff>