[FFmpeg-user] Trouble transcoding with cuda
Ray Randomnic
randomnicode at gmail.com
Wed Sep 4 04:24:04 EEST 2019
Hey folks,
I'm trying to transcode an HEVC (yuv420p10le) encoded file to H264 using a
GTX 1650 nvenc and having issues with what I assume are the pixel formats
conversions on hardware. My encode speed (in fps) is pretty low (see
below), far lower than I get when transcoding HEVC -> HEVC. ffmpeg version
is N-94578-gd6bd902599-gcff309097a+3 (on a Windows 10 OS, though I don't
think this is relevant). For the purposes of this experiment, let's say I'm
not concerned with lossiness with format conversions.
I'd like to know what I'm doing wrong and what commands I can issue for the
following:
decode on GPU -> format conversion (if necessary) on GPU -> encode on GPU.
I might not be understanding a few concepts.
The combination of options that I thought were available and I tried out
are:
- decoder (I mostly left this blank for auto) and encoder (always
h264_nvenc)
- hwaccel
- hwaccel_output_format
- filters (vf):
- format
- scale_npp (for format conversion on gpu)
I have no idea what the options pix_fmt or other filters like colorspace do
for hardware (how is pix_fmt different from hwaccel_output_format?). At
this point I'm kind of stuck. Don't know how to convert formats on the GPU
(I assume the format conversion is happening on the CPU).
Input details:
ffprobe input.mp4
Stream #0:0(eng): Video: hevc (Main 10) (hvc1 / 0x31637668),
yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 1920x1080, 24886 kb/s, SAR 1:1
DAR 16:9, 29.99 fps, ...
Summary of various combinations (- indicates left blank):
test | hwaccel | hwaccel_output_format | filter (vf) |
encodefps | note
1 | cuda | - | - | X
| Failed
2 | cuda | cuda | - | X
| Failed
3 | cuda | yuv420p | - | 361
| Video messed up
4 | cuda | cuda | format=yuv420p | X
| Failed
5 | cuvid | cuda | format=yuv420p | 91
| Not using GPU decode
6 | cuda | - | format=yuv420p | 161
| Not using GPU format conversion
7 | cuvid | - | format=yuv420p | 91
| Not using GPU decode
8 | cuda | - | scale_npp=format=yuv420p | X
| Failed
9 | cuda | cuda | scale_npp=format=yuv420p | X
| Failed
I would expect a speed of around test 3 (without the screwed up video). Is
there any way to convert the pixel formats on the hardware without screwing
up the video? On a similar note, I'd love for someone to explain the
failing encodes.
Here are the details for corresponding encodes:
1. ffmpeg -loglevel verbose -hwaccel cuda -i input.mp4 -c:v h264_nvenc
output.mp4
Fails with the following:
[graph_1_in_0_1 @ 000001cc9670e4c0] tb:1/48000 samplefmt:fltp
samplerate:48000 chlayout:0x3
[hevc @ 000001cc8740fc00] NVDEC capabilities:
[hevc @ 000001cc8740fc00] format supported: yes, max_mb_count: 262144
[hevc @ 000001cc8740fc00] min_width: 144, max_width: 8192
[hevc @ 000001cc8740fc00] min_height: 144, max_height: 8192
[graph 0 input from stream 0:0 @ 000001cc87420840] w:1920 h:1080
pixfmt:p010le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
[h264_nvenc @ 000001cc8747fbc0] Loaded Nvenc version 9.0
[h264_nvenc @ 000001cc8747fbc0] Nvenc initialized successfully
[h264_nvenc @ 000001cc8747fbc0] 1 CUDA capable devices found
[h264_nvenc @ 000001cc8747fbc0] [ GPU #0 - < GeForce GTX 1650 > has
Compute SM 7.5 ]
[h264_nvenc @ 000001cc8747fbc0] 10 bit encode not supported
[h264_nvenc @ 000001cc8747fbc0] No NVENC capable devices found
[h264_nvenc @ 000001cc8747fbc0] Nvenc unloaded
Error initializing output stream 0:0 -- Error while opening encoder for
output stream #0:0 - maybe incorrect parameters such as bit_rate, rate,
width or height
2. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format cuda -i
input.mp4 -c:v h264_nvenc output.mp4
Fails with the following:
[graph_1_in_0_1 @ 00000240b7932340] tb:1/48000 samplefmt:fltp
samplerate:48000 chlayout:0x3
[hevc @ 00000240b79e37c0] NVDEC capabilities:
[hevc @ 00000240b79e37c0] format supported: yes, max_mb_count: 262144
[hevc @ 00000240b79e37c0] min_width: 144, max_width: 8192
[hevc @ 00000240b79e37c0] min_height: 144, max_height: 8192
[graph 0 input from stream 0:0 @ 00000240b7937e00] w:1920 h:1080
pixfmt:cuda tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
[h264_nvenc @ 00000240b7483700] Loaded Nvenc version 9.0
[h264_nvenc @ 00000240b7483700] Nvenc initialized successfully
[h264_nvenc @ 00000240b7483700] 10 bit encode not supported
[h264_nvenc @ 00000240b7483700] Provided device doesn't support required
NVENC features
[h264_nvenc @ 00000240b7483700] Nvenc unloaded
Error initializing output stream 0:0 -- Error while opening encoder for
output stream #0:0 - maybe incorrect parameters such as bit_rate, rate,
width or height
Alright, so it seems that the hardware h264 encoder doesn't support 10
bit encodes (that's coming from the decoder). So lets try changing the
format:
3. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format yuv420p
-i input.mp4 -c:v h264_nvenc output.mp4
Pretty decent encode at ~ 360 fps. Alas, the video is screwed up. Colors
are weird:
[graph_1_in_0_1 @ 00000256c9ac7b40] tb:1/48000 samplefmt:fltp
samplerate:48000 chlayout:0x3
[hevc @ 00000256cbb737c0] NVDEC capabilities:
[hevc @ 00000256cbb737c0] format supported: yes, max_mb_count: 262144
[hevc @ 00000256cbb737c0] min_width: 144, max_width: 8192
[hevc @ 00000256cbb737c0] min_height: 144, max_height: 8192
[graph 0 input from stream 0:0 @ 00000256cbac7e00] w:1920 h:1080
pixfmt:yuv420p tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
[h264_nvenc @ 00000256cb693700] Loaded Nvenc version 9.0
[h264_nvenc @ 00000256cb693700] Nvenc initialized successfully
[h264_nvenc @ 00000256cb693700] 1 CUDA capable devices found
[h264_nvenc @ 00000256cb693700] [ GPU #0 - < GeForce GTX 1650 > has
Compute SM 7.5 ]
[h264_nvenc @ 00000256cb693700] supports NVENC
Let's use a format filter to change format:
4. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format cuda -i
input.mp4 -vf format=yuv420p -c:v h264_nvenc output.mp4
Fails with the following:
[graph_1_in_0_1 @ 0000019390de5c80] tb:1/48000 samplefmt:fltp
samplerate:48000 chlayout:0x3
[hevc @ 00000193908675c0] NVDEC capabilities:
[hevc @ 00000193908675c0] format supported: yes, max_mb_count: 262144
[hevc @ 00000193908675c0] min_width: 144, max_width: 8192
[hevc @ 00000193908675c0] min_height: 144, max_height: 8192
[graph 0 input from stream 0:0 @ 00000193a031ee80] w:1920 h:1080
pixfmt:cuda tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
[auto_scaler_0 @ 00000193b7aee780] w:iw h:ih flags:'bicubic' interl:0
[Parsed_format_0 @ 00000193908eee80] auto-inserting filter
'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the
filter 'Parsed_format_0'
Impossible to convert between the formats supported by the filter 'graph
0 input from stream 0:0' and the filter 'auto_scaler_0'
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:0
5. ffmpeg -loglevel verbose -hwaccel cuvid -hwaccel_output_format cuda
-i input.mp4 -vf format=yuv420p -c:v h264_nvenc output.mp4
Succeeds, but only encodes at around 91 fps, due to, I assume, not using
GPU decoder. What is the difference between cuvid and cuda hwaccel (why did
the previous fail and this succeed)? Here is the relevant output:
[graph_1_in_0_1 @ 000002152cc3cc00] tb:1/48000 samplefmt:fltp
samplerate:48000 chlayout:0x3
[hevc @ 000002152ac33700] Initializing cuvid hwaccel
[AVHWFramesContext @ 000002152cc3f0c0] Pixel format 'yuv420p10le' is not
supported
[hevc @ 000002152ac33700] Error initializing a CUDA frame pool
cuvid hwaccel requested for input stream #0:0, but cannot be initialized.
[hevc @ 000002152ac33700] Error parsing NAL unit #2.
[hevc @ 000002152ac79180] Could not find ref with POC 0
Error while decoding stream #0:0: Operation not permitted
[graph 0 input from stream 0:0 @ 000002152d638b80] w:1920 h:1080
pixfmt:yuv420p10le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
[auto_scaler_0 @ 000002152ca176c0] w:iw h:ih flags:'bicubic' interl:0
[Parsed_format_0 @ 000002152d3fee40] auto-inserting filter
'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the
filter 'Parsed_format_0'
[auto_scaler_0 @ 000002152ca176c0] w:1920 h:1080 fmt:yuv420p10le sar:1/1
-> w:1920 h:1080 fmt:yuv420p sar:1/1 flags:0x4
[h264_nvenc @ 000002152ac31800] Loaded Nvenc version 9.0
[h264_nvenc @ 000002152ac31800] Nvenc initialized successfully
[h264_nvenc @ 000002152ac31800] 1 CUDA capable devices found
[h264_nvenc @ 000002152ac31800] [ GPU #0 - < GeForce GTX 1650 > has
Compute SM 7.5 ]
[h264_nvenc @ 000002152ac31800] supports NVENC
Take out hwaccel_output:
6. ffmpeg -loglevel verbose -hwaccel cuda -i in.mp4 -vf format=yuv420p
-c:v h264_nvenc out.mp4
Succeeds, encodes at 161 fps (using both hardware GPU decoder and
encoder, but I believe the changing of format is happening on the CPU
between the two stages).
[graph_1_in_0_1 @ 0000025491bf2b00] tb:1/48000 samplefmt:fltp
samplerate:48000 chlayout:0x3
[hevc @ 0000025491b84900] NVDEC capabilities:
[hevc @ 0000025491b84900] format supported: yes, max_mb_count: 262144
[hevc @ 0000025491b84900] min_width: 144, max_width: 8192
[hevc @ 0000025491b84900] min_height: 144, max_height: 8192
[graph 0 input from stream 0:0 @ 0000025491c0eec0] w:1920 h:1080
pixfmt:p010le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
[auto_scaler_0 @ 00000254b747cfc0] w:iw h:ih flags:'bicubic' interl:0
[Parsed_format_0 @ 000002549203d840] auto-inserting filter
'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the
filter 'Parsed_format_0'
[auto_scaler_0 @ 00000254b747cfc0] w:1920 h:1080 fmt:p010le sar:1/1 ->
w:1920 h:1080 fmt:yuv420p sar:1/1 flags:0x4
[h264_nvenc @ 00000254920a0f40] Loaded Nvenc version 9.0
[h264_nvenc @ 00000254920a0f40] Nvenc initialized successfully
[h264_nvenc @ 00000254920a0f40] 1 CUDA capable devices found
[h264_nvenc @ 00000254920a0f40] [ GPU #0 - < GeForce GTX 1650 > has
Compute SM 7.5 ]
[h264_nvenc @ 00000254920a0f40] supports NVENC
7. ffmpeg -loglevel verbose -hwaccel cuvid -i in.mp4 -vf format=yuv420p
-c:v h264_nvenc out.mp4
Only encoding on GPU, not decoding (91 fps).
[graph_1_in_0_1 @ 000002163875b5c0] tb:1/48000 samplefmt:fltp
samplerate:48000 chlayout:0x3
[hevc @ 00000216380c3c00] Initializing cuvid hwaccel
[AVHWFramesContext @ 00000216387fc300] Pixel format 'yuv420p10le' is not
supported
[hevc @ 00000216380c3c00] Error initializing a CUDA frame pool
cuvid hwaccel requested for input stream #0:0, but cannot be initialized.
[hevc @ 00000216380c3c00] Error parsing NAL unit #2.
[hevc @ 000002163813d300] Could not find ref with POC 0
Error while decoding stream #0:0: Operation not permitted
[graph 0 input from stream 0:0 @ 00000216387594c0] w:1920 h:1080
pixfmt:yuv420p10le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
[auto_scaler_0 @ 000002164f8a0c40] w:iw h:ih flags:'bicubic' interl:0
[Parsed_format_0 @ 00000216387593c0] auto-inserting filter
'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the
filter 'Parsed_format_0'
[auto_scaler_0 @ 000002164f8a0c40] w:1920 h:1080 fmt:yuv420p10le sar:1/1
-> w:1920 h:1080 fmt:yuv420p sar:1/1 flags:0x4
[h264_nvenc @ 0000021638590f40] Loaded Nvenc version 9.0
[h264_nvenc @ 0000021638590f40] Nvenc initialized successfully
[h264_nvenc @ 0000021638590f40] 1 CUDA capable devices found
[h264_nvenc @ 0000021638590f40] [ GPU #0 - < GeForce GTX 1650 > has
Compute SM 7.5 ]
[h264_nvenc @ 0000021638590f40] supports NVENC
Lets see if I can do format conversion in the GPU (instead of GPU -> CPU
-> GPU), by using the scale_npp filter.
8. ffmpeg -loglevel verbose -hwaccel cuda -i input.mp4 -vf
scale_npp=format=yuv420p -c:v h264_nvenc output.mp4
Fails
[graph_1_in_0_1 @ 0000022f3001e080] tb:1/48000 samplefmt:fltp
samplerate:48000 chlayout:0x3
[hevc @ 0000022f207d7f40] NVDEC capabilities:
[hevc @ 0000022f207d7f40] format supported: yes, max_mb_count: 262144
[hevc @ 0000022f207d7f40] min_width: 144, max_width: 8192
[hevc @ 0000022f207d7f40] min_height: 144, max_height: 8192
[graph 0 input from stream 0:0 @ 0000022f3034ee80] w:1920 h:1080
pixfmt:p010le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
[auto_scaler_0 @ 0000022f47b2d300] w:iw h:ih flags:'bicubic' interl:0
[Parsed_scale_npp_0 @ 0000022f20c49b40] auto-inserting filter
'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the
filter 'Parsed_scale_npp_0'
Impossible to convert between the formats supported by the filter 'graph
0 input from stream 0:0' and the filter 'auto_scaler_0'
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:0
9. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format cuda -i
in.mp4 -vf scale_npp=format=yuv420p -c:v h264_nvenc out.mp4
Fails:
[graph_1_in_0_1 @ 00000200040adac0] tb:1/48000 samplefmt:fltp
samplerate:48000 chlayout:0x3
[hevc @ 00000200747b65c0] NVDEC capabilities:
[hevc @ 00000200747b65c0] format supported: yes, max_mb_count: 262144
[hevc @ 00000200747b65c0] min_width: 144, max_width: 8192
[hevc @ 00000200747b65c0] min_height: 144, max_height: 8192
[graph 0 input from stream 0:0 @ 00000200040aa8c0] w:1920 h:1080
pixfmt:cuda tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
[Parsed_scale_npp_0 @ 0000020074c75b80] Unsupported input format: p010le
[Parsed_scale_npp_0 @ 0000020074c75b80] Failed to configure output pad
on Parsed_scale_npp_0
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:0
I'd appreciate any help or pointer in the right direction (even an
alternate mailing list).
More information about the ffmpeg-user
mailing list