[FFmpeg-user] ffmpeg GPU selection issue

Dennis Mungai dmngaie at gmail.com
Wed Sep 4 18:28:56 EEST 2019


On Wed, 4 Sep 2019 at 10:39, Matthew Reus <matthew.reus01 at gmail.com> wrote:
>
> Hello
> I have ubuntu 18.04 server where i have install ffmpeg and compile sdk as
> well as all requirement of NVIDIA tesla M 60 driver .
>
>
> *1.Issue is whenever i define gpu , both gpu 1 and gpu2 take the process *
>
> *2.ffmepg mostof the time shows frame drops and video buffer *
>
> *Here is have attached all the output and script *
>
> ffmpeg version N-94423-ga0c1970 Copyright (c) 2000-2019 the FFmpeg
> developers
>   built with gcc 7 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
>   configuration: --prefix=/root/ffmpeg_build --pkg-config-flags=--static
> --extra-cflags=-I/root/ffmpeg_build/include
> --extra-ldflags=-L/root/ffmpeg_build/lib --extra-libs='-lpthread -lm'
> --bindir=/root/bin --enable-cuda --enable-cuvid --enable-libnpp
> --extra-cflags=-I../nv_sdk --extra-ldflags=-L../nv_sdk --enable-cuda-nvcc
> --enable-nvenc --extra-cflags=-I/usr/local/cuda/include/
> --extra-ldflags=-L/usr/local/cuda/lib64/ --enable-gpl --enable-libaom
> --enable-libass --enable-libfdk-aac --enable-vaapi --enable-libfreetype
> --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx
> --enable-libx264 --enable-libx265 --enable-nonfree
>   libavutil      56. 32.100 / 56. 32.100
>   libavcodec     58. 55.100 / 58. 55.100
>   libavformat    58. 30.100 / 58. 30.100
>   libavdevice    58.  9.100 / 58.  9.100
>   libavfilter     7. 58.100 /  7. 58.100
>   libswscale      5.  6.100 /  5.  6.100
>   libswresample   3.  6.100 /  3.  6.100
>   libpostproc    55.  6.100 / 55.  6.100
> Hyper fast Audio and Video encoder
> usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options]
> outfile}...
>
>
> *My test script is *
> *ffmpeg -hwaccel_device 1 -hwaccel auto  -i
> 'udp://@224.2.2.21:5008?fifo_size=1000000\&overrun_nonfatal
> <http://224.2.2.21:5008?fifo_size=1000000\&overrun_nonfatal>' -vf
> "hwupload_cuda,format=yuv420p|cuda,yadif_cuda=0:-1:0,scale_npp=-1:720" -c:v
> h264_nvenc -gpu 1  -b:v 1800k -c:a aac  -aspect 16:9  -g 50 -b:a 64k -ar
> 44100 -ac 2 -f flv
> 'rtmp://admin:netaccess@192.168.0.44:1935/nettv/netBBS11500.stream
> <http://admin:netaccess@192.168.0.44:1935/nettv/netBBS11500.stream>'
> </dev/null >/dev/null 2>/var/log/BBs1.log  &*
>
>
> ffmpeg version N-94423-ga0c1970 Copyright (c) 2000-2019 the FFmpeg
> developers
>   built with gcc 7 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
>   configuration: --prefix=/root/ffmpeg_build --pkg-config-flags=--static
> --extra-cflags=-I/root/ffmpeg_build/include
> --extra-ldflags=-L/root/ffmpeg_build/lib --extra-libs='-lpthread -lm'
> --bindir=/root/bin --enable-cuda --enable-cuvid --enable-libnpp
> --extra-cflags=-I../nv_sdk --extra-ldflags=-L../nv_sdk --enable-cuda-nvcc
> --enable-nvenc --extra-cflags=-I/usr/local/cuda/include/
> --extra-ldflags=-L/usr/local/cuda/lib64/ --enable-gpl --enable-libaom
> --enable-libass --enable-libfdk-aac --enable-vaapi --enable-libfreetype
> --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx
> --enable-libx264 --enable-libx265 --enable-nonfree
>   libavutil      56. 32.100 / 56. 32.100
>   libavcodec     58. 55.100 / 58. 55.100
>   libavformat    58. 30.100 / 58. 30.100
>   libavdevice    58.  9.100 / 58.  9.100
>   libavfilter     7. 58.100 /  7. 58.100
>   libswscale      5.  6.100 /  5.  6.100
>   libswresample   3.  6.100 /  3.  6.100
>   libpostproc    55.  6.100 / 55.  6.100
> [h264 @ 0x559b7b7248c0] SPS unavailable in decode_picture_timing
> [h264 @ 0x559b7b7248c0] non-existing PPS 0 referenced
> [h264 @ 0x559b7b7248c0] SPS unavailable in decode_picture_timing
> [h264 @ 0x559b7b7248c0] non-existing PPS 0 referenced
> [h264 @ 0x559b7b7248c0] decode_slice_header error
> [h264 @ 0x559b7b7248c0] no frame!
> [h264 @ 0x559b7b7248c0] SPS unavailable in decode_picture_timing
> [h264 @ 0x559b7b7248c0] non-existing PPS 0 referenced
> [h264 @ 0x559b7b7248c0] SPS unavailable in decode_picture_timing
> [h264 @ 0x559b7b7248c0] non-existing PPS 0 referenced
> [h264 @ 0x559b7b7248c0] decode_slice_header error
> [h264 @ 0x559b7b7248c0] no frame!
> [h264 @ 0x559b7b7248c0] SPS unavailable in decode_picture_timing
> [h264 @ 0x559b7b7248c0] non-existing PPS 0 referenced
> [h264 @ 0x559b7b7248c0] SPS unavailable in decode_picture_timing
> [h264 @ 0x559b7b7248c0] non-existing PPS 0 referenced
> [h264 @ 0x559b7b7248c0] decode_slice_header error
> [h264 @ 0x559b7b7248c0] no frame!
> [h264 @ 0x559b7b7248c0] mmco: unref short failure
>     Last message repeated 1 times
> [h264 @ 0x559b7b7248c0] number of reference frames (0+4) exceeds max (3;
> probably corrupt input), discarding one
> Input #0, mpegts, from 'udp://@
> 224.2.2.21:5008?fifo_size=1000000\&overrun_nonfatal':
>   Duration: N/A, start: 7352.806033, bitrate: N/A
>   Program 60
>     Metadata:
>       service_name    : BBS TV 1
>       service_provider:
>     Stream #0:0[0x3d]: Video: h264 (Main) ([27][0][0][0] / 0x001B),
> yuv420p(tv, bt470bg, top first), 704x576 [SAR 12:11 DAR 4:3], 25 fps, 50
> tbr, 90k tbn, 50 tbc
>     Stream #0:1[0x3e](eng): Audio: mp2 ([4][0][0][0] / 0x0004), 48000 Hz,
> stereo, s16p, 128 kb/s
> [rtmp @ 0x559b7b7241c0] Ignoring unsupported var reason
> [h264 @ 0x559b7b73eb80] Using auto hwaccel type cuda with new device
> created from 1.
> Stream mapping:
>   Stream #0:0 -> #0:0 (h264 (native) -> h264 (h264_nvenc))
>   Stream #0:1 -> #0:1 (mp2 (native) -> aac (native))
> Press [q] to stop, [?] for help
> [h264 @ 0x559b7c007e00] reference picture missing during reorder
> [h264 @ 0x559b7c007e00] Missing reference picture, default is 65297
> [h264 @ 0x559b7c024680] mmco: unref short failure
>     Last message repeated 1 times
> [h264 @ 0x559b7c024680] number of reference frames (0+4) exceeds max (3;
> probably corrupt input), discarding one
> [h264 @ 0x559b7c05d780] mmco: unref short failure
> [h264 @ 0x559b7c108940] mmco: unref short failure
> Output #0, flv, to 'rtmp://
> admin:netaccess at 192.168.0.44:1935/nettv/netBBS11500.stream':
>   Metadata:
>     encoder         : Lavf58.30.100
>     Stream #0:0: Video: h264 (h264_nvenc) (Main) ([7][0][0][0] / 0x0007),
> cuda, 880x720 [SAR 16:11 DAR 16:9], q=-1--1, 1800 kb/s, 25 fps, 1k tbn, 25
> tbc
>     Metadata:
>       encoder         : Lavc58.55.100 h264_nvenc
>     Side data:
>       cpb: bitrate max/min/avg: 0/0/1800000 buffer size: 3600000 vbv_delay:
> -1
>     Stream #0:1(eng): Audio: aac (LC) ([10][0][0][0] / 0x000A), 44100 Hz,
> stereo, fltp, 64 kb/s
>     Metadata:
>       encoder         : Lavc58.55.100 aac
> root at ubuntu:/var/log# 17.0 size=   68167kB time=00:04:54.44
> bitrate=1896.6kbits/s speed=1.02x


Hello there,

Please try this. I've simplified your command a bit for legibility.
These `<>` have been dropped.

ffmpeg -fflags +genpts -vsync 1 -threads 4 \
-hwaccel nvdec -hwaccel_device 1 -hwaccel_output_format cuda \
-i 'udp://@224.2.2.21:5008?fifo_size=1000000\&overrun_nonfatal \
-vf "yadif_cuda=0:-1:0,scale_npp=-1:720" \
-c:v h264_nvenc -preset:v llhq -rc:v cbr_ld_hq -gpu 1 -b:v 1800k
-maxrate:v 1800k -bufsize:v 1800k -r:v 25 -g:v 50 \
-c:a aac -b:a 64k -ar 44100 -ac 2 \
-f flv -flags +global_header -map 0 \
'rtmp://admin:netaccess@192.168.0.44:1935/nettv/netBBS11500.stream'

A few notes:

1. Note how we call up the hwaccel method. Please don't set this to
auto. Judging by your console output, you definitely have nvdec
available. Use it.

2. See how we request for a specific texture format output from the
decoder tied to the hwaccel method. In this case we ask for cuda. That
way you can skip the unnecessary hwupload parts in your previous
script. These extra bits will definitely slow you down.

3. The thread count (-threads 4) is explicitly set to a low value, 4.
For hwaccels such as nvdec, this is ideal. Very high numbers (~16+)
may result in decoder initialization failure, with warnings.

4. On encoder presets: You're using a Maxwell Gen 2 GPU (a Tesla M60).
Based on your previous command line, I assumed you're targeting
constant bitrate output. With that in mind, the command above selects
the low latency high quality preset (-preset:v llhq) whose rate
control method is overridden to constant bitrate, low latency high
quality mode (-cbr:v cbr_ld_hq) while adapting your selected GOP size
and frame rate.

5. On device selection: This is governed by the -hwaccel_device
arguments passed to the underlying hwaccel , and for the encoder, the
-gpu argument takes precedence. Your mistake in the previous command
was calling up hwupload_cuda without specifying a device to use. Your
previous arguments resulted in the creation of a random CUDA device in
the middle of a filter chain, invoking expensive copies to and from
system memory. And that will definitely slow down the encoder.

As an example, with a single RTX 2080 on my laptop encoding one of the
C-band satellite capture samples from https://kodi.wiki/view/Samples :

cd ~/test
time ffmpeg -fflags +genpts -vsync 1 -threads 4 \
-hwaccel nvdec -hwaccel_device 0 -hwaccel_output_format cuda \
-i 'test.mkv' \
-vf "yadif_cuda=0:-1:0,scale_npp=-1:720" \
-c:v h264_nvenc -preset:v llhq -rc:v cbr_ld_hq -gpu 0 -b:v 1800k
-maxrate:v 1800k -bufsize:v 1800k -r:v 59.94 -g:v 59.98 \
-c:a aac -b:a 64k -ar 44100 -ac 2 \
-f flv -flags +global_header -map 0 'test.flv'

And this runs at a sweet, sweet ~11x speed:

frame=17791 fps=657 q=27.0 Lsize=   68121kB time=00:04:56.92
bitrate=1879.4kbits/s dup=8895 drop=0 speed=  11x
video:65225kB audio:2336kB subtitle:0kB other streams:0kB global
headers:0kB muxing overhead: 0.829441%
[aac @ 0x561db5b25440] Qavg: 153.261

real    0m27.461s
user    0m15.690s
sys    0m1.404s


At the very least, on your hardware, you should be getting throughput
speeds in multiples of ~1x with no drops whatsoever.

Test and report back.

Warm regards,

Dennis.


More information about the ffmpeg-user mailing list