[FFmpeg-user] maximum of cpus or threads supported in on decoding

Dennis Mungai dmngaie at gmail.com
Wed Dec 18 17:29:27 EET 2019


On Wed, 18 Dec 2019 at 17:33, heyufei2008 at outlook.com
<heyufei2008 at outlook.com> wrote:
>
> Hi Carl
>
> It seems that ffmpeg does not support numa and processor groups, is there a reason for that ?
>
> John

Hey there,

This is supported, provided your build was configured with pthreads support.
However, you must also pin ffmpeg to specific processor and memory
nodes via numactl.
Do not rely on your operating system's automatic NUMA balancer for this.

Here's a snapshot of numastat from a node running multiple ffmpeg
processes bound to specific NUMA nodes matching the GPU topology from
nvidia-smi:

nvidia-smi topo --matrix

Output:

nvidia-smi topo --matrix
    GPU0    GPU1    GPU2    GPU3    CPU Affinity
GPU0     X     SYS    SYS    SYS    0-23,48-71
GPU1    SYS     X     NODE    NODE    24-47,72-95
GPU2    SYS    NODE     X     PHB    24-47,72-95
GPU3    SYS    NODE    PHB     X     24-47,72-95

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect
between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect
between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge
(typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without
traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Comparing that to the output from numactl -H:

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
22 23 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
69 70 71
node 0 size: 257625 MB
node 0 free: 82564 MB
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
43 44 45 46 47 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95
node 1 size: 258016 MB
node 1 free: 75488 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10



And then tapping into each GPU with ffmpeg such that we tie processes
to the same nodes as the GPUs are on:

numastat -p ffmpeg

Per-node process memory usage (in MBs)
PID                        Node 0          Node 1           Total
----------------  --------------- --------------- ---------------
58223 (ffmpeg)            2102.72           24.04         2126.75
64285 (ffmpeg)              16.57         4989.87         5006.44
66707 (ffmpeg)              16.96         4998.27         5015.23
68719 (ffmpeg)              16.76         4989.13         5005.89
70800 (ffmpeg)              16.74         4997.01         5013.75
72017 (ffmpeg)              16.76         5015.60         5032.36
74203 (ffmpeg)              16.72         4946.99         4963.71
76597 (ffmpeg)              16.86         5025.52         5042.38
77722 (ffmpeg)            4970.77           24.21         4994.97
79603 (ffmpeg)              16.72         4995.20         5011.92
81882 (ffmpeg)            4984.33           24.14         5008.47
83570 (ffmpeg)              16.57         4990.14         5006.71
85616 (ffmpeg)            4963.34           24.50         4987.84
87655 (ffmpeg)              16.71         4971.09         4987.81
----------------  --------------- --------------- ---------------
Total                    17188.52        50015.71        67204.24

You'll see that the configured constraints are respected.

You'd need to call up numactl with the correct arguments to launch ffmpeg:

/usr/bin/numactl --membind=$nodes --cpunodebind=$nodes ffmpeg -args

etc.

I cannot recommend taskset as it also explicitly requires CPU
isolation on boot via the isolcpu boot parameter. Too much of a
hassle.
It can be useful if and only if you need on the fly changes to CPU
pinning settings. Don't use taskset and numactl together, stick to
either.

To confirm node and memory pinning from a PID above, I can run the
command below :

grep Cpus_allowed_list /proc/64285/status

Output:

Cpus_allowed_list:    24-47,72-95

And that CPU list corresponds to node 1.

If you're using the libx265 encoder in FFmpeg, this wrapper is also NUMA aware.
It will require the presence of numa development packages and headers
(On Ubuntu, its' provided by the libnuma1 libnuma-dev packages).
See the threading docs on the same:
https://x265.readthedocs.io/en/default/threading.html


More information about the ffmpeg-user mailing list