[FFmpeg-devel] [PATCH 3/3] lavc/cbrt_tablegen: unroll table generation loop

Thu Dec 31 18:20:48 CET 2015

On Thu, Dec 31, 2015 at 8:46 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Thu, Dec 31, 2015 at 11:39 AM, Ganesh Ajjanagadde
> <gajjanagadde at gmail.com> wrote:
>>
>> This patch does not seem to have measurable impact, at least on x86-64,
>> though
>> there could be benefits for less than stellar branch predictors.
>
> [..]
>>
>> -        for (i = 0; i < 1<<13; i++) {
>> -            if (!(i & 7))
>> -                cbrt_tab[i].f = 16 * cbrt_tab[i>>3].f;
>> -            else
>> -                cbrt_tab[i].f = i * cbrt(i);
>> +        for (i = 0; i < 1<<13; i+=8) {
>> +            cbrt_tab[i].f = 16 * cbrt_tab[i>>3].f;
>> +            cbrt_tab[i+1].f = (i+1) * cbrt(i+1);
>> +            cbrt_tab[i+2].f = (i+2) * cbrt(i+2);
>> +            cbrt_tab[i+3].f = (i+3) * cbrt(i+3);
>> +            cbrt_tab[i+4].f = (i+4) * cbrt(i+4);
>> +            cbrt_tab[i+5].f = (i+5) * cbrt(i+5);
>> +            cbrt_tab[i+6].f = (i+6) * cbrt(i+6);
>> +            cbrt_tab[i+7].f = (i+7) * cbrt(i+7);
>
>
> gcc (and most other compilers) will unroll the loop automatically, I
> suspect. Check disassembly to confirm?

checked, it does not on gcc at least. I would in fact suspect the
opposite;  this increases the binary size non-negligibly (unlike an
unrolling by two for instance), and is slightly nontrivial for a
compiler due to the i&7 business.

>
> (That doesn't mean the patch shouldn't go in, I'm just trying to help you
> explain the result. I have no comment on the patch itself.)

I think the branch predictor explanation is reasonable, these are very
predictable due to the relatively long length of the loop, and simple,
periodic branches. On the other hand, this varies considerably across
even generations of intel CPU's; hence I posted the patch. Thanks for
the idea though.

>
> Ronald