[FFmpeg-user] maximum of cpus or threads supported in on decoding
Dennis Mungai
dmngaie at gmail.com
Wed Dec 18 17:29:27 EET 2019
On Wed, 18 Dec 2019 at 17:33, heyufei2008 at outlook.com
<heyufei2008 at outlook.com> wrote:
>
> Hi Carl
>
> It seems that ffmpeg does not support numa and processor groups, is there a reason for that ?
>
> John
Hey there,
This is supported, provided your build was configured with pthreads support.
However, you must also pin ffmpeg to specific processor and memory
nodes via numactl.
Do not rely on your operating system's automatic NUMA balancer for this.
Here's a snapshot of numastat from a node running multiple ffmpeg
processes bound to specific NUMA nodes matching the GPU topology from
nvidia-smi:
nvidia-smi topo --matrix
Output:
nvidia-smi topo --matrix
GPU0 GPU1 GPU2 GPU3 CPU Affinity
GPU0 X SYS SYS SYS 0-23,48-71
GPU1 SYS X NODE NODE 24-47,72-95
GPU2 SYS NODE X PHB 24-47,72-95
GPU3 SYS NODE PHB X 24-47,72-95
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect
between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect
between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge
(typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without
traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
Comparing that to the output from numactl -H:
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
22 23 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
69 70 71
node 0 size: 257625 MB
node 0 free: 82564 MB
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
43 44 45 46 47 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95
node 1 size: 258016 MB
node 1 free: 75488 MB
node distances:
node 0 1
0: 10 21
1: 21 10
And then tapping into each GPU with ffmpeg such that we tie processes
to the same nodes as the GPUs are on:
numastat -p ffmpeg
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
---------------- --------------- --------------- ---------------
58223 (ffmpeg) 2102.72 24.04 2126.75
64285 (ffmpeg) 16.57 4989.87 5006.44
66707 (ffmpeg) 16.96 4998.27 5015.23
68719 (ffmpeg) 16.76 4989.13 5005.89
70800 (ffmpeg) 16.74 4997.01 5013.75
72017 (ffmpeg) 16.76 5015.60 5032.36
74203 (ffmpeg) 16.72 4946.99 4963.71
76597 (ffmpeg) 16.86 5025.52 5042.38
77722 (ffmpeg) 4970.77 24.21 4994.97
79603 (ffmpeg) 16.72 4995.20 5011.92
81882 (ffmpeg) 4984.33 24.14 5008.47
83570 (ffmpeg) 16.57 4990.14 5006.71
85616 (ffmpeg) 4963.34 24.50 4987.84
87655 (ffmpeg) 16.71 4971.09 4987.81
---------------- --------------- --------------- ---------------
Total 17188.52 50015.71 67204.24
You'll see that the configured constraints are respected.
You'd need to call up numactl with the correct arguments to launch ffmpeg:
/usr/bin/numactl --membind=$nodes --cpunodebind=$nodes ffmpeg -args
etc.
I cannot recommend taskset as it also explicitly requires CPU
isolation on boot via the isolcpu boot parameter. Too much of a
hassle.
It can be useful if and only if you need on the fly changes to CPU
pinning settings. Don't use taskset and numactl together, stick to
either.
To confirm node and memory pinning from a PID above, I can run the
command below :
grep Cpus_allowed_list /proc/64285/status
Output:
Cpus_allowed_list: 24-47,72-95
And that CPU list corresponds to node 1.
If you're using the libx265 encoder in FFmpeg, this wrapper is also NUMA aware.
It will require the presence of numa development packages and headers
(On Ubuntu, its' provided by the libnuma1 libnuma-dev packages).
See the threading docs on the same:
https://x265.readthedocs.io/en/default/threading.html
More information about the ffmpeg-user
mailing list