[FFmpeg-devel] [PATCH 2/3] avcodec/aacsbr: Add comment about possibly optimization in sbr_dequant()

Sat Dec 12 23:59:14 CET 2015

On Sat, Dec 12, 2015 at 05:24:34PM -0500, Ganesh Ajjanagadde wrote:
> On Sat, Dec 12, 2015 at 1:17 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> [...]
> 
> >> >> >
> >> >> > The exp2f expressions are:
> >> >> > exp2f(sbr->data[0].env_facs_q[e][k] * alpha + 7.0f);
> >> >> > exp2f((pan_offset - sbr->data[1].env_facs_q[e][k]) * alpha);
> >> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[0].noise_facs_q[e][k] + 1);
> >> >> > exp2f(12 - sbr->data[1].noise_facs_q[e][k]);
> >> >> > exp2f(alpha * sbr->data[ch].env_facs_q[e][k] + 6.0f);
> >> >> > exp2f(NOISE_FLOOR_OFFSET - sbr->data[ch].noise_facs_q[e][k]);
> >> >> >
> >> >> > Here alpha is 1 or 0.5, pan_offset 12 or 24 and NOISE_FLOOR_OFFSET is 6.
> >> >> > After patch 3 of this series, env_facs_q is in the range from 0 to 127 and
> >> >> > noise_facs_q is already limited to the range from 0 to 30.
> >> >> >
> >> >> > So x should always be in the range -300..300, or so.
> >> >>
> >> >> Very good, thanks a lot.
> >> >>
> >> >> Based on the above range, my idea is to not even use a LUT, but use
> >> >> something like exp2fi followed by multiplication by M_SQRT2 depending
> >> >> on even or odd.
> >> >
> >> > conditional operations can due to branch misprediction be potentially
> >> > rather slow
> >>
> >> I think it will still be far faster than exp2f, and in the absence of
> >> hard numbers, I view this a far better approach than a large (~300
> >> element) lut. Of course, the proof and extent of this will need to
> >> wait for actual benches.
> >
> > alternatively one could do a
> > if (x+A < (unsigned)B)
> >     LUT[x+A]
> > else
> >     exp2whatever(x)
> >
> > the range in practice should be much smaller than +-300
> 
> That still uses a branch, so unless for whatever reason the numbers
> tend to concentrate in an interval (which you believe but I am
> agnostic about since I don't know AAC), this is code complexity for

theres an easy way to find out, just print the numbers and use
sort -n | uniq -c
i didnt try but i expect most of the range to have 0 hits

> little gain. Furthermore, that B then becomes a "voodoo" constant, and
> performance may vary across files depending on the degree of
> concentration of the inputs. I personally don't like such things
> unless they are very well justified after all other easy, uniform
> methods of optimization are exhausted.
> 
> >
> > also the LUT can possibly be shared between codecs
> 
> This is something for which there is plenty of low hanging fruit
> across FFmpeg. The best example I know of is the sin/cos tables used
> across dct, fft, and other areas: a lot of them can be derived from
> sin(2*pi*i/65536) for 0 <= i <= 65536/4, and cheap runtime derivation
> via indexing and symmetry exploitation (eg 16*i, 32*i, flip the sign).
> I will work on this only after other things are sorted out; so in the
> meantime it is up for grabs.
> 
> Complications come from threading issues for the case of dynamic init,
> since one needs to place a lock of some sort to avoid write
> contention. And --enable-hardcoded-tables does not help at all, as it
> distracts from the key issues since one needs to reason about both
> cases.

you do not "need" a lock

the most standard way to add a table is to type the numbers of its
entries in the a source file and commit that.
if the table is to be generated at build time then some complexity is
added to do that with a generator run at build time
if it is build at runtime then it no longer can be shared between
processes and needs to be written to disk in case it is paged out
a "hardcoded" table resides in read only memory and on page out
is discarded, the kernel can read it directly from the excutable
again when it gets accessed again.
and one can support both runtime and buildtime tables, so the user
can choose
a manually hardcoded table without generator is harder to update
though and it and the build generated ones result in larger object and
excutable files

> 
> Unfortunately, the question of static vs dynamic tables is something
> that can easily get bikeshedded to death. I am trying to do

yes, everyone has a different use case in mind and a different variant
is optimal for each.
In the end there is not a big difference between them i think,
especially not for small tables. For big tables there could be more
significant disadvantages in each solution relative to what is optimal
for each others use case

anyway, you could try exp2whatever(x>>1) * tab[x&1]
or something
that may be faster IFF the compiler builds the  if(x&1) into bad
code and if the values actually fluctuate enough

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I have often repented speaking, but never of holding my tongue.
-- Xenocrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151212/a3328aa8/attachment.sig>