[FFmpeg-user] Trouble transcoding with cuda

Wed Sep 4 04:24:04 EEST 2019

Hey folks,

I'm trying to transcode an HEVC (yuv420p10le) encoded file to H264 using a
GTX 1650 nvenc and having issues with what I assume are the pixel formats
conversions on hardware. My encode speed (in fps) is pretty low (see
below), far lower than I get when transcoding HEVC -> HEVC. ffmpeg version
is N-94578-gd6bd902599-gcff309097a+3 (on a Windows 10 OS, though I don't
think this is relevant). For the purposes of this experiment, let's say I'm
not concerned with lossiness with format conversions.

I'd like to know what I'm doing wrong and what commands I can issue for the
following:
decode on GPU -> format conversion (if necessary) on GPU -> encode on GPU.
I might not be understanding a few concepts.

The combination of options that I thought were available and I tried out
are:
- decoder (I mostly left this blank for auto) and encoder (always
h264_nvenc)
- hwaccel
- hwaccel_output_format
- filters (vf):
  - format
  - scale_npp (for format conversion on gpu)

I have no idea what the options pix_fmt or other filters like colorspace do
for hardware (how is pix_fmt different from hwaccel_output_format?). At
this point I'm kind of stuck. Don't know how to convert formats on the GPU
(I assume the format conversion is happening on the CPU).

Input details:
ffprobe input.mp4

Stream #0:0(eng): Video: hevc (Main 10) (hvc1 / 0x31637668),
yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 1920x1080, 24886 kb/s, SAR 1:1
DAR 16:9, 29.99 fps, ...

Summary of various combinations (- indicates left blank):
test | hwaccel | hwaccel_output_format | filter (vf)              |
encodefps | note
1    | cuda    | -                     | -                        | X
   | Failed
2    | cuda    | cuda                  | -                        | X
   | Failed
3    | cuda    | yuv420p               | -                        | 361
   | Video messed up
4    | cuda    | cuda                  | format=yuv420p           | X
   | Failed
5    | cuvid   | cuda                  | format=yuv420p           | 91
  | Not using GPU decode
6    | cuda    | -                     | format=yuv420p           | 161
   | Not using GPU format conversion
7    | cuvid   | -                     | format=yuv420p           | 91
  | Not using GPU decode
8    | cuda    | -                     | scale_npp=format=yuv420p | X
   | Failed
9    | cuda    | cuda                  | scale_npp=format=yuv420p | X
   | Failed

I would expect a speed of around test 3 (without the screwed up video). Is
there any way to convert the pixel formats on the hardware without screwing
up the video? On a similar note, I'd love for someone to explain the
failing encodes.

Here are the details for corresponding encodes:

   1. ffmpeg -loglevel verbose -hwaccel cuda -i input.mp4 -c:v h264_nvenc
   output.mp4

   Fails with the following:

   [graph_1_in_0_1 @ 000001cc9670e4c0] tb:1/48000 samplefmt:fltp
   samplerate:48000 chlayout:0x3
   [hevc @ 000001cc8740fc00] NVDEC capabilities:
   [hevc @ 000001cc8740fc00] format supported: yes, max_mb_count: 262144
   [hevc @ 000001cc8740fc00] min_width: 144, max_width: 8192
   [hevc @ 000001cc8740fc00] min_height: 144, max_height: 8192
   [graph 0 input from stream 0:0 @ 000001cc87420840] w:1920 h:1080
   pixfmt:p010le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
   [h264_nvenc @ 000001cc8747fbc0] Loaded Nvenc version 9.0
   [h264_nvenc @ 000001cc8747fbc0] Nvenc initialized successfully
   [h264_nvenc @ 000001cc8747fbc0] 1 CUDA capable devices found
   [h264_nvenc @ 000001cc8747fbc0] [ GPU #0 - < GeForce GTX 1650 > has
   Compute SM 7.5 ]
   [h264_nvenc @ 000001cc8747fbc0] 10 bit encode not supported
   [h264_nvenc @ 000001cc8747fbc0] No NVENC capable devices found
   [h264_nvenc @ 000001cc8747fbc0] Nvenc unloaded
   Error initializing output stream 0:0 -- Error while opening encoder for
   output stream #0:0 - maybe incorrect parameters such as bit_rate, rate,
   width or height

   2. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format cuda -i
   input.mp4 -c:v h264_nvenc output.mp4

   Fails with the following:

   [graph_1_in_0_1 @ 00000240b7932340] tb:1/48000 samplefmt:fltp
   samplerate:48000 chlayout:0x3
   [hevc @ 00000240b79e37c0] NVDEC capabilities:
   [hevc @ 00000240b79e37c0] format supported: yes, max_mb_count: 262144
   [hevc @ 00000240b79e37c0] min_width: 144, max_width: 8192
   [hevc @ 00000240b79e37c0] min_height: 144, max_height: 8192
   [graph 0 input from stream 0:0 @ 00000240b7937e00] w:1920 h:1080
   pixfmt:cuda tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
   [h264_nvenc @ 00000240b7483700] Loaded Nvenc version 9.0
   [h264_nvenc @ 00000240b7483700] Nvenc initialized successfully
   [h264_nvenc @ 00000240b7483700] 10 bit encode not supported
   [h264_nvenc @ 00000240b7483700] Provided device doesn't support required
   NVENC features
   [h264_nvenc @ 00000240b7483700] Nvenc unloaded
   Error initializing output stream 0:0 -- Error while opening encoder for
   output stream #0:0 - maybe incorrect parameters such as bit_rate, rate,
   width or height

   Alright, so it seems that the hardware h264 encoder doesn't support 10
   bit encodes (that's coming from the decoder). So lets try changing the
   format:

   3. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format yuv420p
   -i input.mp4 -c:v h264_nvenc output.mp4

   Pretty decent encode at ~ 360 fps. Alas, the video is screwed up. Colors
   are weird:

   [graph_1_in_0_1 @ 00000256c9ac7b40] tb:1/48000 samplefmt:fltp
   samplerate:48000 chlayout:0x3
   [hevc @ 00000256cbb737c0] NVDEC capabilities:
   [hevc @ 00000256cbb737c0] format supported: yes, max_mb_count: 262144
   [hevc @ 00000256cbb737c0] min_width: 144, max_width: 8192
   [hevc @ 00000256cbb737c0] min_height: 144, max_height: 8192
   [graph 0 input from stream 0:0 @ 00000256cbac7e00] w:1920 h:1080
   pixfmt:yuv420p tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
   [h264_nvenc @ 00000256cb693700] Loaded Nvenc version 9.0
   [h264_nvenc @ 00000256cb693700] Nvenc initialized successfully
   [h264_nvenc @ 00000256cb693700] 1 CUDA capable devices found
   [h264_nvenc @ 00000256cb693700] [ GPU #0 - < GeForce GTX 1650 > has
   Compute SM 7.5 ]
   [h264_nvenc @ 00000256cb693700] supports NVENC

   Let's use a format filter to change format:

   4. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format cuda -i
   input.mp4 -vf format=yuv420p -c:v h264_nvenc output.mp4

   Fails with the following:

   [graph_1_in_0_1 @ 0000019390de5c80] tb:1/48000 samplefmt:fltp
   samplerate:48000 chlayout:0x3
   [hevc @ 00000193908675c0] NVDEC capabilities:
   [hevc @ 00000193908675c0] format supported: yes, max_mb_count: 262144
   [hevc @ 00000193908675c0] min_width: 144, max_width: 8192
   [hevc @ 00000193908675c0] min_height: 144, max_height: 8192
   [graph 0 input from stream 0:0 @ 00000193a031ee80] w:1920 h:1080
   pixfmt:cuda tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
   [auto_scaler_0 @ 00000193b7aee780] w:iw h:ih flags:'bicubic' interl:0
   [Parsed_format_0 @ 00000193908eee80] auto-inserting filter
   'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the
   filter 'Parsed_format_0'
   Impossible to convert between the formats supported by the filter 'graph
   0 input from stream 0:0' and the filter 'auto_scaler_0'
   Error reinitializing filters!
   Failed to inject frame into filter network: Function not implemented
   Error while processing the decoded data for stream #0:0

   5. ffmpeg -loglevel verbose -hwaccel cuvid -hwaccel_output_format cuda
   -i input.mp4 -vf format=yuv420p -c:v h264_nvenc output.mp4

   Succeeds, but only encodes at around 91 fps, due to, I assume, not using
   GPU decoder. What is the difference between cuvid and cuda hwaccel (why did
   the previous fail and this succeed)? Here is the relevant output:

   [graph_1_in_0_1 @ 000002152cc3cc00] tb:1/48000 samplefmt:fltp
   samplerate:48000 chlayout:0x3
   [hevc @ 000002152ac33700] Initializing cuvid hwaccel
   [AVHWFramesContext @ 000002152cc3f0c0] Pixel format 'yuv420p10le' is not
   supported
   [hevc @ 000002152ac33700] Error initializing a CUDA frame pool
   cuvid hwaccel requested for input stream #0:0, but cannot be initialized.
   [hevc @ 000002152ac33700] Error parsing NAL unit #2.
   [hevc @ 000002152ac79180] Could not find ref with POC 0
   Error while decoding stream #0:0: Operation not permitted
   [graph 0 input from stream 0:0 @ 000002152d638b80] w:1920 h:1080
   pixfmt:yuv420p10le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
   [auto_scaler_0 @ 000002152ca176c0] w:iw h:ih flags:'bicubic' interl:0
   [Parsed_format_0 @ 000002152d3fee40] auto-inserting filter
   'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the
   filter 'Parsed_format_0'
   [auto_scaler_0 @ 000002152ca176c0] w:1920 h:1080 fmt:yuv420p10le sar:1/1
   -> w:1920 h:1080 fmt:yuv420p sar:1/1 flags:0x4
   [h264_nvenc @ 000002152ac31800] Loaded Nvenc version 9.0
   [h264_nvenc @ 000002152ac31800] Nvenc initialized successfully
   [h264_nvenc @ 000002152ac31800] 1 CUDA capable devices found
   [h264_nvenc @ 000002152ac31800] [ GPU #0 - < GeForce GTX 1650 > has
   Compute SM 7.5 ]
   [h264_nvenc @ 000002152ac31800] supports NVENC

   Take out hwaccel_output:

   6. ffmpeg -loglevel verbose -hwaccel cuda -i in.mp4 -vf format=yuv420p
   -c:v h264_nvenc out.mp4

   Succeeds, encodes at 161 fps (using both hardware GPU decoder and
   encoder, but I believe the changing of format is happening on the CPU
   between the two stages).

   [graph_1_in_0_1 @ 0000025491bf2b00] tb:1/48000 samplefmt:fltp
   samplerate:48000 chlayout:0x3
   [hevc @ 0000025491b84900] NVDEC capabilities:
   [hevc @ 0000025491b84900] format supported: yes, max_mb_count: 262144
   [hevc @ 0000025491b84900] min_width: 144, max_width: 8192
   [hevc @ 0000025491b84900] min_height: 144, max_height: 8192
   [graph 0 input from stream 0:0 @ 0000025491c0eec0] w:1920 h:1080
   pixfmt:p010le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
   [auto_scaler_0 @ 00000254b747cfc0] w:iw h:ih flags:'bicubic' interl:0
   [Parsed_format_0 @ 000002549203d840] auto-inserting filter
   'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the
   filter 'Parsed_format_0'
   [auto_scaler_0 @ 00000254b747cfc0] w:1920 h:1080 fmt:p010le sar:1/1 ->
   w:1920 h:1080 fmt:yuv420p sar:1/1 flags:0x4
   [h264_nvenc @ 00000254920a0f40] Loaded Nvenc version 9.0
   [h264_nvenc @ 00000254920a0f40] Nvenc initialized successfully
   [h264_nvenc @ 00000254920a0f40] 1 CUDA capable devices found
   [h264_nvenc @ 00000254920a0f40] [ GPU #0 - < GeForce GTX 1650 > has
   Compute SM 7.5 ]
   [h264_nvenc @ 00000254920a0f40] supports NVENC

   7. ffmpeg -loglevel verbose -hwaccel cuvid -i in.mp4 -vf format=yuv420p
   -c:v h264_nvenc out.mp4

   Only encoding on GPU, not decoding (91 fps).

   [graph_1_in_0_1 @ 000002163875b5c0] tb:1/48000 samplefmt:fltp
   samplerate:48000 chlayout:0x3
   [hevc @ 00000216380c3c00] Initializing cuvid hwaccel
   [AVHWFramesContext @ 00000216387fc300] Pixel format 'yuv420p10le' is not
   supported
   [hevc @ 00000216380c3c00] Error initializing a CUDA frame pool
   cuvid hwaccel requested for input stream #0:0, but cannot be initialized.
   [hevc @ 00000216380c3c00] Error parsing NAL unit #2.
   [hevc @ 000002163813d300] Could not find ref with POC 0
   Error while decoding stream #0:0: Operation not permitted
   [graph 0 input from stream 0:0 @ 00000216387594c0] w:1920 h:1080
   pixfmt:yuv420p10le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
   [auto_scaler_0 @ 000002164f8a0c40] w:iw h:ih flags:'bicubic' interl:0
   [Parsed_format_0 @ 00000216387593c0] auto-inserting filter
   'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the
   filter 'Parsed_format_0'
   [auto_scaler_0 @ 000002164f8a0c40] w:1920 h:1080 fmt:yuv420p10le sar:1/1
   -> w:1920 h:1080 fmt:yuv420p sar:1/1 flags:0x4
   [h264_nvenc @ 0000021638590f40] Loaded Nvenc version 9.0
   [h264_nvenc @ 0000021638590f40] Nvenc initialized successfully
   [h264_nvenc @ 0000021638590f40] 1 CUDA capable devices found
   [h264_nvenc @ 0000021638590f40] [ GPU #0 - < GeForce GTX 1650 > has
   Compute SM 7.5 ]
   [h264_nvenc @ 0000021638590f40] supports NVENC

   Lets see if I can do format conversion in the GPU (instead of GPU -> CPU
   -> GPU), by using the scale_npp filter.

   8. ffmpeg -loglevel verbose -hwaccel cuda -i input.mp4 -vf
   scale_npp=format=yuv420p -c:v h264_nvenc output.mp4

   Fails

   [graph_1_in_0_1 @ 0000022f3001e080] tb:1/48000 samplefmt:fltp
   samplerate:48000 chlayout:0x3
   [hevc @ 0000022f207d7f40] NVDEC capabilities:
   [hevc @ 0000022f207d7f40] format supported: yes, max_mb_count: 262144
   [hevc @ 0000022f207d7f40] min_width: 144, max_width: 8192
   [hevc @ 0000022f207d7f40] min_height: 144, max_height: 8192
   [graph 0 input from stream 0:0 @ 0000022f3034ee80] w:1920 h:1080
   pixfmt:p010le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
   [auto_scaler_0 @ 0000022f47b2d300] w:iw h:ih flags:'bicubic' interl:0
   [Parsed_scale_npp_0 @ 0000022f20c49b40] auto-inserting filter
   'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the
   filter 'Parsed_scale_npp_0'
   Impossible to convert between the formats supported by the filter 'graph
   0 input from stream 0:0' and the filter 'auto_scaler_0'
   Error reinitializing filters!
   Failed to inject frame into filter network: Function not implemented
   Error while processing the decoded data for stream #0:0

   9. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format cuda -i
   in.mp4 -vf scale_npp=format=yuv420p -c:v h264_nvenc out.mp4

   Fails:

   [graph_1_in_0_1 @ 00000200040adac0] tb:1/48000 samplefmt:fltp
   samplerate:48000 chlayout:0x3
   [hevc @ 00000200747b65c0] NVDEC capabilities:
   [hevc @ 00000200747b65c0] format supported: yes, max_mb_count: 262144
   [hevc @ 00000200747b65c0] min_width: 144, max_width: 8192
   [hevc @ 00000200747b65c0] min_height: 144, max_height: 8192
   [graph 0 input from stream 0:0 @ 00000200040aa8c0] w:1920 h:1080
   pixfmt:cuda tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
   [Parsed_scale_npp_0 @ 0000020074c75b80] Unsupported input format: p010le
   [Parsed_scale_npp_0 @ 0000020074c75b80] Failed to configure output pad
   on Parsed_scale_npp_0
   Error reinitializing filters!
   Failed to inject frame into filter network: Function not implemented
   Error while processing the decoded data for stream #0:0

I'd appreciate any help or pointer in the right direction (even an
alternate mailing list).