[FFmpeg-devel] [aarch64] improve hscale by 50% with multi-threading

Sebastian Pop sebpop at gmail.com
Sat Jul 18 07:08:02 EEST 2020


hscale is bound by the number of multiply-adds available on a given core.
The attached patch doubles the number of multiply-adds by distributing half
the load to a helper thread.

The performance improves up to 50% on Graviton2 Arm Neoverse-N1 processors.

$ ./ffmpeg_g -nostats -f lavfi -i testsrc2=4k:d=2 -vf
bench=start,scale=1024x1024,bench=stop -f null -
before: [bench @ 0xaaaad62c3d30] t:0.013293 avg:0.013315 max:0.013697
min:0.013293
after:  [bench @ 0xaaaae9346d30] t:0.009637 avg:0.009691 max:0.010005
min:0.009637
38% improvement

scale=1280x720  49% improvement
before: [bench @ 0xaaaadba88d30] t:0.015973 avg:0.016321 max:0.016917
min:0.015973
after:  [bench @ 0xaaaabc78dd30] t:0.010823 avg:0.010869 max:0.011552
min:0.010708

scale=852x480  45% improvement
before: [bench @ 0xaaaaeeed0d30] t:0.013731 avg:0.013727 max:0.013773
min:0.013279
after:  [bench @ 0xaaaaf5f5dd30] t:0.009279 avg:0.009296 max:0.009328
min:0.009187

scale=640x360  45% improvement
before: [bench @ 0xaaaacee25d30] t:0.012010 avg:0.012006 max:0.012053
min:0.011653
after:  [bench @ 0xaaaaea2b5d30] t:0.008077 avg:0.008084 max:0.008409
min:0.008057

scale=284x160  36% improvement
before: [bench @ 0xaaaadbb9ed30] t:0.008384 avg:0.008367 max:0.008421
min:0.008193
after:  [bench @ 0xaaaafb1d6d30] t:0.006099 avg:0.006100 max:0.006120
min:0.006026
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-aarch64-improve-hscale-by-50-with-multi-threading.patch
Type: application/octet-stream
Size: 5945 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20200717/8960b308/attachment.obj>


More information about the ffmpeg-devel mailing list