[FFmpeg-devel] [PATCH] lavc/aacenc_utils: unroll quantize_bands loop

Rostislav Pehlivanov atomnuker at gmail.com
Wed Mar 23 03:27:46 CET 2016


On 22 March 2016 at 19:28, Ganesh Ajjanagadde <gajjanag at gmail.com> wrote:

> On Tue, Mar 22, 2016 at 12:09 PM, Rostislav Pehlivanov
> <atomnuker at gmail.com> wrote:
> > On 22 March 2016 at 17:33, Ganesh Ajjanagadde <gajjanag at gmail.com>
> wrote:
> >
> >> On Sat, Mar 19, 2016 at 2:36 AM, Hendrik Leppkes <h.leppkes at gmail.com>
> >> wrote:
> >> > On Sat, Mar 19, 2016 at 3:27 AM, Ganesh Ajjanagadde <
> gajjanag at gmail.com>
> >> wrote:
> >> >> Yields speedup in quantize_bands, and non-negligible speedup in aac
> >> encoding overall.
> >> >>
> >> >> Sample benchmark (Haswell, -march=native + GCC):
> >> >> new:
> >> >>     [...]
> >> >>     553 decicycles in quantize_bands, 2097136 runs,     16 skips9x
> >> >>     554 decicycles in quantize_bands, 4194266 runs,     38 skips8x
> >> >>     559 decicycles in quantize_bands, 8388534 runs,     74 skips7x
> >> >>
> >> >> old:
> >> >>     [...]
> >> >>     711 decicycles in quantize_bands, 2097140 runs,     12 skips7x
> >> >>     713 decicycles in quantize_bands, 4194277 runs,     27 skips4x
> >> >>     715 decicycles in quantize_bands, 8388538 runs,     70 skips3x
> >> >>
> >> >> old:
> >> >> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac  4.58s user 0.01s
> >> system 99% cpu 4.590 total
> >> >>
> >> >> new:
> >> >> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac  4.54s user 0.02s
> >> system 99% cpu 4.566 total
> >> >>
> >> >> Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
> >> >> ---
> >> >>  libavcodec/aacenc_utils.h | 33 +++++++++++++++++++++++++--------
> >> >>  1 file changed, 25 insertions(+), 8 deletions(-)
> >> >>
> >> >> diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h
> >> >> index 38636e5..0203b6e 100644
> >> >> --- a/libavcodec/aacenc_utils.h
> >> >> +++ b/libavcodec/aacenc_utils.h
> >> >> @@ -62,18 +62,35 @@ static inline int quant(float coef, const float
> Q,
> >> const float rounding)
> >> >>      return sqrtf(a * sqrtf(a)) + rounding;
> >> >>  }
> >> >>
> >> >> +static inline float minf(float x, float y) {
> >> >> +    return x < y ? x : y;
> >> >> +}
> >> >> +
> >> >
> >> > Thats exactly what the FFMIN macro expands to, whats the reason for
> >> > introducing this function?
> >>
> >> There was some compilation difference, in particular this was faster.
> >> No idea why, maybe some repeated evaluation of qc + rounding?
> >>
> >>
> > "No idea why" is not even remotely a valid excuse to have your own
> function
> > which does exactly what FFMIN does.
>
> Please read completely before posting, I gave a reason right above
> that. More verbose description given below.
>
> 1. FFMIN is a macro, this is a function. FFMIN etc have problems with
> recomputation, they have extra mental overhead of things like
> FFMIN(f(), g()) for repeated calls.
> 2. I actually ran into a slowdown while playing with this function,
> reaffirming point 1. That slowdown no longer exists when I
> restructured the code into its current form.
>
> Anyway, why are we even discussing this? I am not going to play the
> game of submitting to each and every whim, taking the effort of
> posting new patches, only to have a final statement like "0.1% is not
> worth it". It wastes time for both of us.
>
> In other words, let me know if you actually want to move forward with
> both of these. Else, I will drop them.
>
>
Drop them. I've started work on some SIMD which should boost performance by
quite a lot.


More information about the ffmpeg-devel mailing list