[FFmpeg-user] NVDEC/NVENC resources underutilization
Garri Djavadyan
garryd at comnet.uz
Wed Feb 28 20:10:08 EET 2018
On 2018-02-28 19:53, Marcin Woźniak wrote:
> Try the same command but remove overlay filter and check.
I removed filter and found slight NVENC usage increase (33%). Then I
conducted following checks with minimal options set.
-------------------
Full HW transcoding
-------------------
/usr/local/ffmpeg-dev/bin/ffmpeg -hwaccel cuvid -c:v mpeg4_cuvid -i
input.avi -map 0:v:0 -c:v h264_nvenc -b:v 1024k -f null -
...
frame=69703 fps=3086 q=19.0 Lsize=N/A time=00:46:28.32 bitrate=N/A
speed= 123x
CPU usage:
top - 22:37:15 up 22:26, 11 users, load average: 0.18, 0.35, 0.42
Threads: 1188 total, 2 running, 1185 sleeping, 0 stopped, 1 zombie
%Cpu0 : 21.7 us, 18.4 sy, 0.0 ni, 59.9 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu1 : 19.9 us, 17.8 sy, 0.0 ni, 62.3 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu2 : 1.3 us, 0.3 sy, 0.0 ni, 98.4 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu3 : 7.3 us, 5.6 sy, 0.0 ni, 87.1 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni, 97.4 id, 2.6 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu5 : 1.0 us, 0.0 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu6 : 0.0 us, 8.6 sy, 0.0 ni, 91.4 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu7 : 0.3 us, 1.0 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
KiB Mem: 12261384 total, 11810564 used, 450820 free, 1303564 buffers
KiB Swap: 7999484 total, 560488 used, 7438996 free. 5455608 cached
Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
P
6495 root 20 0 12.955g 116208 101472 R 83.9 0.9 0:06.99
`- ffmpeg
3
6498 root 20 0 12.955g 116208 101472 S 0.0 0.9 0:00.00
`- ffmpeg
4
6499 root 20 0 12.955g 116208 101472 S 7.6 0.9 0:00.59
`- ffmpeg
4
6500 root 20 0 12.955g 116208 101472 S 0.0 0.9 0:00.00
`- ffmpeg
4
6501 root 20 0 12.955g 116208 101472 S 0.0 0.9 0:00.00
`- ffmpeg
4
6502 root 20 0 12.955g 116208 101472 S 0.0 0.9 0:00.00
`- ffmpeg
4
6503 root 20 0 12.955g 116208 101472 S 0.0 0.9 0:00.00
`- ffmpeg
4
6504 root 20 0 12.955g 116208 101472 S 0.0 0.9 0:00.00
`- ffmpeg
4
6505 root 20 0 12.955g 116208 101472 S 0.0 0.9 0:00.00
`- ffmpeg
4
6506 root 20 0 12.955g 116208 101472 S 0.0 0.9 0:00.00
`- ffmpeg
4
6507 root 20 0 12.955g 116208 101472 S 0.0 0.9 0:00.00
`- ffmpeg
4
6508 root 20 0 12.955g 116208 101472 S 0.0 0.9 0:00.00
`- ffmpeg
4
NVDEC/NVENC usage:
# nvidia-smi dmon
# gpu pwr temp sm mem enc dec mclk pclk
# Idx W C % % % % MHz MHz
0 49 49 10 14 99 63 3802 2012
0 49 50 9 14 99 59 3802 2012
0 49 50 14 14 99 62 3802 2012
0 49 50 10 14 99 62 3802 2012
0 48 50 12 14 97 62 3802 2012
0 48 51 9 14 99 63 3802 2012
------------------------
Partially HW transcoding
------------------------
/usr/local/ffmpeg-dev/bin/ffmpeg -c:v mpeg4_cuvid -i input.avi -map
0:v:0 -c:v h264_nvenc -b:v 1024k -f null -
...
frame=69703 fps=2136 q=19.0 Lsize=N/A time=00:46:28.32 bitrate=N/A
speed=85.4x
CPU usage:
top - 22:42:04 up 22:31, 11 users, load average: 0.23, 0.29, 0.39
Threads: 1185 total, 3 running, 1181 sleeping, 0 stopped, 1 zombie
%Cpu0 : 6.6 us, 2.4 sy, 0.0 ni, 91.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu1 : 9.1 us, 2.4 sy, 0.0 ni, 88.6 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu2 : 24.2 us, 2.0 sy, 0.0 ni, 73.8 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu3 : 6.0 us, 1.7 sy, 0.0 ni, 92.3 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu4 : 3.0 us, 0.7 sy, 0.0 ni, 96.4 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu5 : 12.2 us, 3.7 sy, 0.0 ni, 84.1 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu6 : 0.7 us, 15.4 sy, 0.0 ni, 84.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu7 : 28.7 us, 2.0 sy, 0.0 ni, 69.3 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
KiB Mem: 12261384 total, 11916268 used, 345116 free, 1276148 buffers
KiB Swap: 7999484 total, 561988 used, 7437496 free. 5450584 cached
Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
P
6620 root 20 0 17.235g 211408 186860 R 74.2 1.7 0:08.31
`- ffmpeg
4
6623 root 20 0 17.235g 211408 186860 S 0.0 1.7 0:00.00
`- ffmpeg
7
6624 root 20 0 17.235g 211408 186860 S 13.1 1.7 0:01.36
`- ffmpeg
5
6625 root 20 0 17.235g 211408 186860 S 0.0 1.7 0:00.00
`- ffmpeg
0
6626 root 20 0 17.235g 211408 186860 S 0.0 1.7 0:00.00
`- ffmpeg
7
6627 root 20 0 17.235g 211408 186860 S 0.0 1.7 0:00.00
`- ffmpeg
4
6628 root 20 0 17.235g 211408 186860 S 0.0 1.7 0:00.00
`- ffmpeg
7
6629 root 20 0 17.235g 211408 186860 S 0.0 1.7 0:00.00
`- ffmpeg
7
6630 root 20 0 17.235g 211408 186860 S 0.0 1.7 0:00.00
`- ffmpeg
7
6631 root 20 0 17.235g 211408 186860 S 0.0 1.7 0:00.00
`- ffmpeg
7
6632 root 20 0 17.235g 211408 186860 S 0.0 1.7 0:00.00
`- ffmpeg
7
6633 root 20 0 17.235g 211408 186860 S 0.0 1.7 0:00.00
`- ffmpeg
7
6634 root 20 0 17.235g 211408 186860 S 7.6 1.7 0:00.81
`- ffmpeg
2
6635 root 20 0 17.235g 211408 186860 S 0.0 1.7 0:00.00
`- ffmpeg
0
NVDEC/NVENC usage:
# nvidia-smi dmon
# gpu pwr temp sm mem enc dec mclk pclk
# Idx W C % % % % MHz MHz
0 46 49 34 10 69 42 3802 2012
0 46 50 34 10 70 42 3802 2012
0 46 50 34 10 70 42 3802 2012
0 46 50 34 10 72 42 3802 2012
0 46 50 34 10 72 42 3802 2012
0 46 50 34 10 70 44 3802 2012
I see 30% NVENC performance loss when the transcoding path is NVDEC -
RAM - NVENC. Does it mean system memory bandwidth is a bottleneck in
this case? Am I faced other unavoidable overheads?
Thanks.
Garri
> W dniu 28.02.2018 o 14:44, Garri Djavadyan pisze:
>> Hello FFmpeg community,
>>
>>
>> I faced a problem with NVDEC/NVENC resources underutilization while
>> running one ffmpeg instance.
>>
>> We use ffmpeg to convert various format videos to MP4(h264/aac) and
>> applying logo overlay. Hardware decoding and encoding process is
>> performed by NVDEC and NVENC chips. For example, our cmdline is:
>>
>> /usr/local/ffmpeg-dev/bin/ffmpeg -y -c:v mpeg4_cuvid \
>> -i input.avi -i logo.png \
>> -filter_complex [0:v:0][1:v:0]overlay=10:10[out1] \
>> -map [out1] -map 0:a:0 -map_metadata -1 -map_chapters -1 \
>> -c:v h264_nvenc -b:v 1024k -r 25 \
>> -c:a libfdk_aac -b:a 128k \
>> -movflags faststart out.mp4
>>
>>
>> The overal transcoding process is greatly accelerated. But I see, that
>> NVDEC/NVENC cycles are not fully utilized. For example:
>>
>> # nvidia-smi dmon
>> # gpu pwr temp sm mem enc dec mclk pclk
>> # Idx W C % % % % MHz MHz
>> 0 32 50 11 3 22 16 3802 1632
>> 0 32 50 11 3 23 14 3802 1632
>> 0 32 50 12 3 22 15 3802 1632
>> 0 33 51 13 3 18 15 3802 1632
>> 0 32 51 12 3 17 11 3802 1632
>> 0 32 51 10 2 20 13 3802 1632
>>
>>
>> I tried to find a bottleneck, but all system resources are OK. For
>> example, CPU (top output, at least 60% idle):
>>
>> Tasks: 296 total, 3 running, 292 sleeping, 0 stopped, 1 zombie
>> %Cpu0 : 14,6 us, 0,7 sy, 0,0 ni, 84,7 id, 0,0 wa, 0,0 hi, 0,0
>> si, 0,0 st
>> %Cpu1 : 11,2 us, 1,4 sy, 0,0 ni, 87,5 id, 0,0 wa, 0,0 hi, 0,0
>> si, 0,0 st
>> %Cpu2 : 14,5 us, 1,3 sy, 0,0 ni, 84,2 id, 0,0 wa, 0,0 hi, 0,0
>> si, 0,0 st
>> %Cpu3 : 4,4 us, 0,7 sy, 0,0 ni, 94,9 id, 0,0 wa, 0,0 hi, 0,0
>> si, 0,0 st
>> %Cpu4 : 39,6 us, 1,3 sy, 0,0 ni, 59,1 id, 0,0 wa, 0,0 hi, 0,0
>> si, 0,0 st
>> %Cpu5 : 2,4 us, 0,7 sy, 0,0 ni, 96,6 id, 0,3 wa, 0,0 hi, 0,0
>> si, 0,0 st
>> %Cpu6 : 2,7 us, 4,3 sy, 0,0 ni, 93,0 id, 0,0 wa, 0,0 hi, 0,0
>> si, 0,0 st
>> %Cpu7 : 21,0 us, 0,7 sy, 0,0 ni, 78,3 id, 0,0 wa, 0,0 hi, 0,0
>> si, 0,0 st
>> КiB Mem: 12261384 total, 10578352 used, 1683032 free, 4809968
>> buffers
>> КiB Swap: 7999484 total, 584072 used, 7415412 free. 1431448
>> cached
>> Mem
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
>> COMMAND
>> 21245 user 20 0 17,516g 235792 187228 R 101,3 1,9
>> 1:21.15
>> ffmpeg
>> 1512 root -51 0 0 0 0 S 7,0 0,0
>> 17:29.39
>> irq/47-nvidia
>>
>> -----------------
>> Memory:
>>
>> # free -m
>> total used free shared buffers
>> cach
>> ed
>> Mem: 11974 8619 3354 133 3758 653
>> -/+ buffers/cache: 4207 7766
>> Swap: 7811 570 7241
>>
>> -----------------
>> Storage I/O:
>>
>> # iostat -x 1 3
>> Linux 3.13.0-142-generic (user-desktop) 28.02.2018 _x86
>> _64_ (8 CPU)
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 10,86 3,03 1,43 4,75 0,00 79,93
>>
>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>> sda 0,49 0,00 0,01 0,00 0,29 0,00
>> 45
>> ,90 0,00 47,45 47,45 0,00 14,55 0,02
>> sdb 849,21 769,16 47,81 22,18 3859,21 3935,51
>> 222
>> ,74 9,18 131,22 23,62 363,20 4,05 28,38
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 14,65 0,00 1,78 0,00 0,00 83,57
>>
>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>> sda 0,00 0,00 0,00 0,00 0,00 0,00
>> 0
>> ,00 0,00 0,00 0,00 0,00 0,00 0,00
>> sdb 0,00 0,00 0,00 0,00 0,00 0,00
>> 0
>> ,00 0,00 0,00 0,00 0,00 0,00 0,00
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 21,24 0,00 2,53 0,51 0,00 75,73
>>
>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>> sda 0,00 0,00 0,00 0,00 0,00 0,00
>> 0
>> ,00 0,00 0,00 0,00 0,00 0,00 0,00
>> sdb 0,00 7,00 0,00 3,00 0,00 88,00
>> 58
>> ,67 0,04 13,33 0,00 13,33 13,33 4,00
>>
>>
>> --------------------
>> FFmper version and configuration options:
>>
>> # /usr/local/ffmpeg-dev/bin/ffmpeg -version
>> ffmpeg version N-90054-g474194a Copyright (c) 2000-2018 the FFmpeg
>> developers
>> built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.4)
>> configuration: --prefix=/usr/local/ffmpeg-dev --enable-gpl --enable-
>> nonfree --enable-libfdk-aac --enable-libx264 --enable-nvenc --enable-
>> libnpp
>> libavutil 56. 7.101 / 56. 7.101
>> libavcodec 58. 11.101 / 58. 11.101
>> libavformat 58. 9.100 / 58. 9.100
>> libavdevice 58. 1.100 / 58. 1.100
>> libavfilter 7. 12.100 / 7. 12.100
>> libswscale 5. 0.101 / 5. 0.101
>> libswresample 3. 0.101 / 3. 0.101
>> libpostproc 55. 0.100 / 55. 0.100
>>
>>
>> ---------------------
>> NVIDIA driver and card information:
>>
>> # nvidia-smi
>> Wed Feb 28 17:20:13 2018
>> +--------------------------------------------------------------------
>> ---------+
>> | NVIDIA-SMI 384.111 Driver Version:
>> 384.111 |
>> |-------------------------------+----------------------+---------------
>> -------+
>> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile
>> Uncorr. ECC |
>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-
>> Util Compute M. |
>> |===============================+======================+===============
>> =======|
>> | 0 GeForce GTX 106... Off | 00000000:01:00.0 On
>> | N/A |
>> | 0% 53C P2 32W / 150W | 939MiB / 6071MiB
>> | 11% Default |
>> +-------------------------------+----------------------+---------------
>> -------+
>>
>>
>> +--------------------------------------------------------------------
>> ---------+
>> | Processes: GPU
>> Memory |
>> | GPU PID Type Process
>> name Usage |
>> |======================================================================
>> =======|
>> | 0 1423 G /usr/bin/X
>> 443MiB |
>> | 0 2555 G compiz
>> 224MiB |
>> | 0 14879 G ...-token=ACEXXXXXXXXX
>> XX2E9DDXXXXXXXE41 107MiB |
>> | 0 21458 C /usr/local/ffmpeg-
>> dev/bin/ffmpeg 159MiB |
>> +--------------------------------------------------------------------
>> ---------+
>>
>>
>> I believe I overlooked something, or maybe there are some limitations.
>> So, I kindly ask your suggestions. Many thanks in advance!
>>
>>
>> Garri
>> _______________________________________________
>> ffmpeg-user mailing list
>> ffmpeg-user at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-user
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".
>
>
> _______________________________________________
> ffmpeg-user mailing list
> ffmpeg-user at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-user
>
> To unsubscribe, visit link above, or email
> ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-user
mailing list