[FFmpeg-devel] [PATCH] avcodec/nellymoserenc: avoid wasteful pow
Ganesh Ajjanagadde
gajjanag at mit.edu
Fri Dec 18 16:45:45 CET 2015
On Fri, Dec 18, 2015 at 1:11 AM, Kacper Michajlow <kasper93 at gmail.com> wrote:
> 18 gru 2015 10:06 AM "Kacper Michajlow" <kasper93 at gmail.com> napisał(a):
>>
>> One minor nitpick about commit message. You could mention which compiler
> was used to generate code for benchmark. For example Clang 3.7 replaces
> pow(2,...) with exp2(...) call by itself. So you probably did use gcc.
> Anyway since it is already merged I guess take my reply as a hint for next
> time :)
Thanks: yes, I have been sloppy about this.
>>
>> Regards,
>> Kacper
>>
>> 17 gru 2015 5:14 PM "Ganesh Ajjanagadde" <gajjanag at mit.edu> napisał(a):
>>>
>>> On Tue, Dec 15, 2015 at 6:40 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> wrote:
>>> > On Tue, Dec 15, 2015 at 5:25 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> wrote:
>>> >> On Tue, Dec 15, 2015 at 2:23 AM, Michael Niedermayer <michaelni at gmx.at>
> wrote:
>>> >>> On Wed, Dec 09, 2015 at 06:55:25PM -0500, Ganesh Ajjanagadde wrote:
>>> > [...]
>>> >>>>
>>> >>>> diff --git a/libavcodec/nellymoserenc.c b/libavcodec/nellymoserenc.c
>>> >>>> index d998dba..e6023e3 100644
>>> >>>> --- a/libavcodec/nellymoserenc.c
>>> >>>> +++ b/libavcodec/nellymoserenc.c
>>> >>>> @@ -179,8 +179,15 @@ static av_cold int encode_init(AVCodecContext
> *avctx)
>>> >>>>
>>> >>>> /* Generate overlap window */
>>> >>>> ff_init_ff_sine_windows(7);
>>> >>>> - for (i = 0; i < POW_TABLE_SIZE; i++)
>>> >>>> - pow_table[i] = pow(2, -i / 2048.0 - 3.0 +
> POW_TABLE_OFFSET);
>>> >>>> + pow_table[0] = 1;
>>> >>>> + pow_table[1024] = M_SQRT1_2;
>>> >>>> + for (i = 1; i < 513; i++) {
>>> >>>> + double tmp = exp2(-i / 2048.0);
>>> >>>> + pow_table[i] = tmp;
>>> >>>> + pow_table[1024-i] = M_SQRT1_2 / tmp;
>>> >>>> + pow_table[1024+i] = tmp * M_SQRT1_2;
>>> >>>> + pow_table[2048-i] = 0.5 / tmp;
>>> >>>
>>> >>> how much overall init time is gained by this ?
>>> >>> that is time in ffmpeg main() from start to finish when just opening
>>> >>> the file with no decoding aka ./ffmpeg -i somefile
>>> >>
>>> >> Don't know, all I know is cycles are unnecessarily wasted. Will put in
>>> >> cycle numbers.
>>> >>
>>> >
>>> > Here they are:
>>> > proposed: 424160 decicycles in pow_table, 512 runs, 0 skips
>>> > exp2 only: 1262093 decicycles in pow_table, 512 runs, 0 skips
>>> > old: 2849085 decicycles in pow_table, 512 runs, 0 skips
>>> >
>>> > Thus old to exp2 is roughly 2.25x speedup, exp2 to proposed roughly 3x
>>> > speedup, net ~ 6.7x speedup.
>>>
>>> took Michael's comment as a general ack, so pushed with addition of a
>>> comment and cycle numbers.
>>> _______________________________________________
>>> ffmpeg-devel mailing list
>>> ffmpeg-devel at ffmpeg.org
>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> Sorry for top post.
No problem.
>
> -Kacper
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
More information about the ffmpeg-devel
mailing list