[FFmpeg-user] NVDEC/NVENC resources underutilization

Garri Djavadyan garryd at comnet.uz
Wed Feb 28 20:10:08 EET 2018


On 2018-02-28 19:53, Marcin Woźniak wrote:
> Try the same command but remove overlay filter and check.

I removed filter and found slight NVENC usage increase (33%). Then I 
conducted following checks with minimal options set.

-------------------
Full HW transcoding
-------------------

/usr/local/ffmpeg-dev/bin/ffmpeg -hwaccel cuvid -c:v mpeg4_cuvid -i 
input.avi -map 0:v:0 -c:v h264_nvenc -b:v 1024k -f null -
...
frame=69703 fps=3086 q=19.0 Lsize=N/A time=00:46:28.32 bitrate=N/A 
speed= 123x


CPU usage:

top - 22:37:15 up 22:26, 11 users,  load average: 0.18, 0.35, 0.42
Threads: 1188 total,   2 running, 1185 sleeping,   0 stopped,   1 zombie
%Cpu0  : 21.7 us, 18.4 sy,  0.0 ni, 59.9 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu1  : 19.9 us, 17.8 sy,  0.0 ni, 62.3 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu2  :  1.3 us,  0.3 sy,  0.0 ni, 98.4 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu3  :  7.3 us,  5.6 sy,  0.0 ni, 87.1 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni, 97.4 id,  2.6 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu5  :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu6  :  0.0 us,  8.6 sy,  0.0 ni, 91.4 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu7  :  0.3 us,  1.0 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
KiB Mem:  12261384 total, 11810564 used,   450820 free,  1303564 buffers
KiB Swap:  7999484 total,   560488 used,  7438996 free.  5455608 cached 
Mem

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ 
COMMAND                                                                  
         P
  6495 root      20   0 12.955g 116208 101472 R 83.9  0.9   0:06.99       
                                `- ffmpeg                                 
   3
  6498 root      20   0 12.955g 116208 101472 S  0.0  0.9   0:00.00       
                                    `- ffmpeg                             
   4
  6499 root      20   0 12.955g 116208 101472 S  7.6  0.9   0:00.59       
                                    `- ffmpeg                             
   4
  6500 root      20   0 12.955g 116208 101472 S  0.0  0.9   0:00.00       
                                    `- ffmpeg                             
   4
  6501 root      20   0 12.955g 116208 101472 S  0.0  0.9   0:00.00       
                                    `- ffmpeg                             
   4
  6502 root      20   0 12.955g 116208 101472 S  0.0  0.9   0:00.00       
                                    `- ffmpeg                             
   4
  6503 root      20   0 12.955g 116208 101472 S  0.0  0.9   0:00.00       
                                    `- ffmpeg                             
   4
  6504 root      20   0 12.955g 116208 101472 S  0.0  0.9   0:00.00       
                                    `- ffmpeg                             
   4
  6505 root      20   0 12.955g 116208 101472 S  0.0  0.9   0:00.00       
                                    `- ffmpeg                             
   4
  6506 root      20   0 12.955g 116208 101472 S  0.0  0.9   0:00.00       
                                    `- ffmpeg                             
   4
  6507 root      20   0 12.955g 116208 101472 S  0.0  0.9   0:00.00       
                                    `- ffmpeg                             
   4
  6508 root      20   0 12.955g 116208 101472 S  0.0  0.9   0:00.00       
                                    `- ffmpeg                             
   4


NVDEC/NVENC usage:

# nvidia-smi dmon
# gpu   pwr  temp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     %     %     %     %   MHz   MHz
     0    49    49    10    14    99    63  3802  2012
     0    49    50     9    14    99    59  3802  2012
     0    49    50    14    14    99    62  3802  2012
     0    49    50    10    14    99    62  3802  2012
     0    48    50    12    14    97    62  3802  2012
     0    48    51     9    14    99    63  3802  2012


------------------------
Partially HW transcoding
------------------------

/usr/local/ffmpeg-dev/bin/ffmpeg -c:v mpeg4_cuvid -i input.avi -map 
0:v:0 -c:v h264_nvenc -b:v 1024k -f null -
...
frame=69703 fps=2136 q=19.0 Lsize=N/A time=00:46:28.32 bitrate=N/A 
speed=85.4x


CPU usage:

top - 22:42:04 up 22:31, 11 users,  load average: 0.23, 0.29, 0.39
Threads: 1185 total,   3 running, 1181 sleeping,   0 stopped,   1 zombie
%Cpu0  :  6.6 us,  2.4 sy,  0.0 ni, 91.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu1  :  9.1 us,  2.4 sy,  0.0 ni, 88.6 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu2  : 24.2 us,  2.0 sy,  0.0 ni, 73.8 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu3  :  6.0 us,  1.7 sy,  0.0 ni, 92.3 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu4  :  3.0 us,  0.7 sy,  0.0 ni, 96.4 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu5  : 12.2 us,  3.7 sy,  0.0 ni, 84.1 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu6  :  0.7 us, 15.4 sy,  0.0 ni, 84.0 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
%Cpu7  : 28.7 us,  2.0 sy,  0.0 ni, 69.3 id,  0.0 wa,  0.0 hi,  0.0 si,  
0.0 st
KiB Mem:  12261384 total, 11916268 used,   345116 free,  1276148 buffers
KiB Swap:  7999484 total,   561988 used,  7437496 free.  5450584 cached 
Mem

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ 
COMMAND                                                                  
         P
  6620 root      20   0 17.235g 211408 186860 R 74.2  1.7   0:08.31       
                                `- ffmpeg                                 
   4
  6623 root      20   0 17.235g 211408 186860 S  0.0  1.7   0:00.00       
                                    `- ffmpeg                             
   7
  6624 root      20   0 17.235g 211408 186860 S 13.1  1.7   0:01.36       
                                    `- ffmpeg                             
   5
  6625 root      20   0 17.235g 211408 186860 S  0.0  1.7   0:00.00       
                                    `- ffmpeg                             
   0
  6626 root      20   0 17.235g 211408 186860 S  0.0  1.7   0:00.00       
                                    `- ffmpeg                             
   7
  6627 root      20   0 17.235g 211408 186860 S  0.0  1.7   0:00.00       
                                    `- ffmpeg                             
   4
  6628 root      20   0 17.235g 211408 186860 S  0.0  1.7   0:00.00       
                                    `- ffmpeg                             
   7
  6629 root      20   0 17.235g 211408 186860 S  0.0  1.7   0:00.00       
                                    `- ffmpeg                             
   7
  6630 root      20   0 17.235g 211408 186860 S  0.0  1.7   0:00.00       
                                    `- ffmpeg                             
   7
  6631 root      20   0 17.235g 211408 186860 S  0.0  1.7   0:00.00       
                                    `- ffmpeg                             
   7
  6632 root      20   0 17.235g 211408 186860 S  0.0  1.7   0:00.00       
                                    `- ffmpeg                             
   7
  6633 root      20   0 17.235g 211408 186860 S  0.0  1.7   0:00.00       
                                    `- ffmpeg                             
   7
  6634 root      20   0 17.235g 211408 186860 S  7.6  1.7   0:00.81       
                                    `- ffmpeg                             
   2
  6635 root      20   0 17.235g 211408 186860 S  0.0  1.7   0:00.00       
                                    `- ffmpeg                             
   0


NVDEC/NVENC usage:

# nvidia-smi dmon
# gpu   pwr  temp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     %     %     %     %   MHz   MHz
     0    46    49    34    10    69    42  3802  2012
     0    46    50    34    10    70    42  3802  2012
     0    46    50    34    10    70    42  3802  2012
     0    46    50    34    10    72    42  3802  2012
     0    46    50    34    10    72    42  3802  2012
     0    46    50    34    10    70    44  3802  2012


I see 30% NVENC performance loss when the transcoding path is NVDEC - 
RAM - NVENC. Does it mean system memory bandwidth is a bottleneck in 
this case? Am I faced other unavoidable overheads?

Thanks.


Garri


> W dniu 28.02.2018 o 14:44, Garri Djavadyan pisze:
>> Hello FFmpeg community,
>> 
>> 
>> I faced a problem with NVDEC/NVENC resources underutilization while
>> running one ffmpeg instance.
>> 
>> We use ffmpeg to convert various format videos to MP4(h264/aac) and
>> applying logo overlay. Hardware decoding and encoding process is
>> performed by NVDEC and NVENC chips. For example, our cmdline is:
>> 
>> /usr/local/ffmpeg-dev/bin/ffmpeg -y -c:v mpeg4_cuvid \
>>    -i input.avi -i logo.png \
>>    -filter_complex [0:v:0][1:v:0]overlay=10:10[out1] \
>>    -map [out1] -map 0:a:0 -map_metadata -1 -map_chapters -1 \
>>    -c:v h264_nvenc -b:v 1024k -r 25 \
>>    -c:a libfdk_aac -b:a 128k \
>>    -movflags faststart out.mp4
>> 
>> 
>> The overal transcoding process is greatly accelerated. But I see, that
>> NVDEC/NVENC cycles are not fully utilized. For example:
>> 
>> # nvidia-smi dmon
>> # gpu   pwr  temp    sm   mem   enc   dec  mclk  pclk
>> # Idx     W     C     %     %     %     %   MHz   MHz
>>      0    32    50    11     3    22    16  3802  1632
>>      0    32    50    11     3    23    14  3802  1632
>>      0    32    50    12     3    22    15  3802  1632
>>      0    33    51    13     3    18    15  3802  1632
>>      0    32    51    12     3    17    11  3802  1632
>>      0    32    51    10     2    20    13  3802  1632
>> 
>> 
>> I tried to find a bottleneck, but all system resources are OK. For
>> example, CPU (top output, at least 60% idle):
>> 
>> Tasks: 296 total,   3 running, 292 sleeping,   0 stopped,   1 zombie
>> %Cpu0  : 14,6 us,  0,7 sy,  0,0 ni, 84,7 id,  0,0 wa,  0,0 hi,  0,0
>> si,  0,0 st
>> %Cpu1  : 11,2 us,  1,4 sy,  0,0 ni, 87,5 id,  0,0 wa,  0,0 hi,  0,0
>> si,  0,0 st
>> %Cpu2  : 14,5 us,  1,3 sy,  0,0 ni, 84,2 id,  0,0 wa,  0,0 hi,  0,0
>> si,  0,0 st
>> %Cpu3  :  4,4 us,  0,7 sy,  0,0 ni, 94,9 id,  0,0 wa,  0,0 hi,  0,0
>> si,  0,0 st
>> %Cpu4  : 39,6 us,  1,3 sy,  0,0 ni, 59,1 id,  0,0 wa,  0,0 hi,  0,0
>> si,  0,0 st
>> %Cpu5  :  2,4 us,  0,7 sy,  0,0 ni, 96,6 id,  0,3 wa,  0,0 hi,  0,0
>> si,  0,0 st
>> %Cpu6  :  2,7 us,  4,3 sy,  0,0 ni, 93,0 id,  0,0 wa,  0,0 hi,  0,0
>> si,  0,0 st
>> %Cpu7  : 21,0 us,  0,7 sy,  0,0 ni, 78,3 id,  0,0 wa,  0,0 hi,  0,0
>> si,  0,0 st
>> КiB Mem:  12261384 total, 10578352 used,  1683032 free,  4809968
>> buffers
>> КiB Swap:  7999484 total,   584072 used,  7415412 free.  1431448 
>> cached
>> Mem
>> 
>>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
>> COMMAND
>>             21245 user      20   0 17,516g 235792 187228 R 101,3  1,9  
>>  1:21.15
>> ffmpeg
>>               1512 root     -51   0       0      0      0 S   7,0  0,0 
>>  17:29.39
>> irq/47-nvidia
>> 
>> -----------------
>> Memory:
>> 
>> # free -m
>>               total       used       free     shared    buffers     
>> cach
>> ed
>> Mem:      11974       8619       3354        133       3758        653
>> -/+ buffers/cache:       4207       7766
>> Swap:       7811        570       7241
>> 
>> -----------------
>> Storage I/O:
>> 
>> # iostat -x 1 3
>> Linux 3.13.0-142-generic (user-desktop) 	28.02.2018 	_x86
>> _64_	(8 CPU)
>> 
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>            10,86    3,03    1,43    4,75    0,00   79,93
>> 
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sda               0,49     0,00    0,01    0,00     0,29     0,00    
>> 45
>> ,90     0,00   47,45   47,45    0,00  14,55   0,02
>> sdb             849,21   769,16   47,81   22,18  3859,21  3935,51   
>> 222
>> ,74     9,18  131,22   23,62  363,20   4,05  28,38
>> 
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>            14,65    0,00    1,78    0,00    0,00   83,57
>> 
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sda               0,00     0,00    0,00    0,00     0,00     0,00     
>> 0
>> ,00     0,00    0,00    0,00    0,00   0,00   0,00
>> sdb               0,00     0,00    0,00    0,00     0,00     0,00     
>> 0
>> ,00     0,00    0,00    0,00    0,00   0,00   0,00
>> 
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>            21,24    0,00    2,53    0,51    0,00   75,73
>> 
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> sda               0,00     0,00    0,00    0,00     0,00     0,00     
>> 0
>> ,00     0,00    0,00    0,00    0,00   0,00   0,00
>> sdb               0,00     7,00    0,00    3,00     0,00    88,00    
>> 58
>> ,67     0,04   13,33    0,00   13,33  13,33   4,00
>> 
>> 
>> --------------------
>> FFmper version and configuration options:
>> 
>> # /usr/local/ffmpeg-dev/bin/ffmpeg -version
>> ffmpeg version N-90054-g474194a Copyright (c) 2000-2018 the FFmpeg
>> developers
>> built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.4)
>> configuration: --prefix=/usr/local/ffmpeg-dev --enable-gpl --enable-
>> nonfree --enable-libfdk-aac --enable-libx264 --enable-nvenc --enable-
>> libnpp
>> libavutil      56.  7.101 / 56.  7.101
>> libavcodec     58. 11.101 / 58. 11.101
>> libavformat    58.  9.100 / 58.  9.100
>> libavdevice    58.  1.100 / 58.  1.100
>> libavfilter     7. 12.100 /  7. 12.100
>> libswscale      5.  0.101 /  5.  0.101
>> libswresample   3.  0.101 /  3.  0.101
>> libpostproc    55.  0.100 / 55.  0.100
>> 
>> 
>> ---------------------
>> NVIDIA driver and card information:
>> 
>> # nvidia-smi
>> Wed Feb 28 17:20:13 2018
>> +--------------------------------------------------------------------
>> ---------+
>> | NVIDIA-SMI 384.111                Driver Version:
>> 384.111                   |
>> |-------------------------------+----------------------+---------------
>> -------+
>> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile
>> Uncorr. ECC |
>> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-
>> Util  Compute M. |
>> |===============================+======================+===============
>> =======|
>> |   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On
>> |                  N/A |
>> |  0%   53C    P2    32W / 150W |    939MiB /  6071MiB
>> |     11%      Default |
>> +-------------------------------+----------------------+---------------
>> -------+
>>                                                                        
>>           
>> +--------------------------------------------------------------------
>> ---------+
>> | Processes:                                                       GPU
>> Memory |
>> |  GPU       PID   Type   Process
>> name                             Usage      |
>> |======================================================================
>> =======|
>> |    0      1423      G   /usr/bin/X
>> 443MiB |
>> |    0      2555      G   compiz
>> 224MiB |
>> |    0     14879      G   ...-token=ACEXXXXXXXXX
>> XX2E9DDXXXXXXXE41   107MiB |
>> |    0     21458      C   /usr/local/ffmpeg-
>> dev/bin/ffmpeg             159MiB |
>> +--------------------------------------------------------------------
>> ---------+
>> 
>> 
>> I believe I overlooked something, or maybe there are some limitations.
>> So, I kindly ask your suggestions. Many thanks in advance!
>> 
>> 
>> Garri
>> _______________________________________________
>> ffmpeg-user mailing list
>> ffmpeg-user at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-user
>> 
>> To unsubscribe, visit link above, or email
>> ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".
> 
> 
> _______________________________________________
> ffmpeg-user mailing list
> ffmpeg-user at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-user
> 
> To unsubscribe, visit link above, or email
> ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".


More information about the ffmpeg-user mailing list