[FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

Tue Oct 13 07:03:02 CEST 2015

On Tue, Oct 13, 2015 at 12:44 AM, Carl Eugen Hoyos <cehoyos at ag.or.at> wrote:
> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
>> On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos wrote:
>> > Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
>> >
>> >> Bench from libavfilter/astats on a 15 min clip.
>> >
>> > I believe that your test would indicate that the
>> > old variant is faster or that no result can be
>> > given which is what my tests show.
>>
>> Look at the bench and the numbers again, I have
>> provided it above.
>
> Ok:
> old:
>     389 decicycles in abs,      64 runs,      0 skips
>     350 decicycles in abs,     128 runs,      0 skips
>     331 decicycles in abs,     256 runs,      0 skips
>     321 decicycles in abs,     512 runs,      0 skips
>     319 decicycles in abs,    1024 runs,      0 skips
>     318 decicycles in abs,    2048 runs,      0 skips
>     315 decicycles in abs,    4096 runs,      0 skips
>     317 decicycles in abs,    8192 runs,      0 skips
>     335 decicycles in abs,   16384 runs,      0 skips
>     335 decicycles in abs,   32768 runs,      0 skips
>
> mew:
>     382 decicycles in abs,      64 runs,      0 skips
>     361 decicycles in abs,     128 runs,      0 skips
>     356 decicycles in abs,     256 runs,      0 skips
>     334 decicycles in abs,     512 runs,      0 skips
>     322 decicycles in abs,    1024 runs,      0 skips
>     317 decicycles in abs,    2048 runs,      0 skips
>     315 decicycles in abs,    4096 runs,      0 skips
>     341 decicycles in abs,    8192 runs,      0 skips
>     363 decicycles in abs,   16383 runs,      1 skips
>     342 decicycles in abs,   32767 runs,      1 skips
> Numbers with high skips or low runs are not so
> relevant afaik.

Not so relevant, but as I said: it is still better.

>
>> They are essentially identical in the best case
>> (most number of runs), the new variant is faster in
>> the worst case.
>
> I would say the opposite is true but we can certainly
> agree that there is no proof that one is faster.

Do a random float test, the difference is more pronounced.

>
>> You have not provided a bench proving otherwise.
>
> old:
> user    0m20.338s
> user    0m20.408s
> user    0m20.287s
> user    0m20.365s
> user    0m20.208s
> new:
> user    0m20.197s
> user    0m20.577s
> user    0m20.434s
> user    0m20.322s
> user    0m20.356s

The difference here is imo too small to say anything. My point is
precisely this: on most inputs, there is no difference. On bad (worst
case) inputs, using fabs instead of the macro is far superior. The
random float bench proves this. Translating that to some audio file
should be easy: I suspect placing most samples near a silence value
(0) does this.

>
>> > I am not sure if it makes sense to apply a patch
>> > that is meant to improve speed if this improvement
>> > can't be shown.
>>
>> I believe I have shown it above clearly.
>
> Imo, you have shown clearly that neither variant can
> be shown to be faster.
>
> Carl Eugen
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel