[FFmpeg-user] Nvidia Transcoding: Failing Using xstack (When Running Under systemd)

Shane Warren shanew at innovsys.com
Tue Nov 19 16:41:45 EET 2024


On Tue, 19 Nov 2024, 01:20 Shane Warren, <shanew at innovsys.com> wrote:

>> On Mon, 18 Nov 2024, 11:33 pm Shane Warren, <shanew at innovsys.com> wrote:
>>
>> >> I have been trying to track down why when transcoding using xstack 
>> >> with nvidia decoding and encoding I get strange decoding issues in
>> ffmpeg.
>> >>
>> >> Note: I use 2 1 minute long .ts files for this example if you want 
>> >> my inputs, they are available here (as input1.ts and input2.ts) :
>> >>
>> >>
>> >> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fd
>> >> riv%2F&data=05%7C02%7Cshanew%40innovsys.com%7Cc241556f6a2e4253d9bc0
>> >> 8dd0825844e%7C7a48ce45ee974a95ac183390878a179b%7C0%7C0%7C6386756794
>> >> 48993996%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLj
>> >> AuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7
>> >> C%7C&sdata=hVnUflCd1pK6iadB%2FsXUiB1BPuSiPt%2F%2BW3FP8a%2BWDiI%3D&r
>> >> eserved=0 
>> >> e.google.com%2Fdrive%2Ffolders%2F1mZ8xiNvz5ez1ULlNsy5a3KhnhaqQ2Hgo%
>> >> 3Fu
>> >> sp%3Ddrive_link&data=05%7C02%7Cshanew%40innovsys.com%7C02a2eccf16aa
>> >> 494
>> >> 1b6c408dd08136cfd%7C7a48ce45ee974a95ac183390878a179b%7C0%7C0%7C6386
>> >> 756 
>> >> 01721027151%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiI
>> >> wLj
>> >> AuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7
>> >> C%7
>> >> C&sdata=H7Nk9G6qJ3jg17ApCn3iBkSDmN0Mz%2BX5QZzHnSHBnAQ%3D&reserved=0
>> >>
>> >> I got the latest ffmpeg and tried this command (xstacking 2 videos 
>> >> into 1
>> >> output):
>> >>
>> >> ffmpeg -y -threads 2 -nostats -loglevel verbose -probesize 5M 
>> >> -filter_threads 4 -threads 2 -re -fflags +genpts -fflags 
>> >> discardcorrupt  \ -extra_hw_frames 2 -hwaccel cuda 
>> >> -hwaccel_output_format cuda -threads 2 -thread_queue_size 4096 
>> >> -heavy_compr 1 -thread_queue_size 4096 -re -i input1.ts \ 
>> >> -extra_hw_frames 2 -hwaccel cuda -hwaccel_output_format cuda 
>> >> -threads
>> >> 2 -thread_queue_size 4096 -heavy_compr 1 -thread_queue_size 4096 
>> >> -re -i input2.ts \ -filter_complex "\ 
>> >> [0:v:0]yadif_cuda=deint=interlaced,scale_cuda=768:432,hwdownload,fo
>> >> rma
>> >> t=nv12,fps=60000/1001[v0];
>> >> \
>> >> [1:v:0]yadif_cuda=deint=interlaced,scale_cuda=768:432,hwdownload,fo
>> >> rma
>> >> t=nv12,fps=60000/1001[v1];
>> >> \
>> >> [v0][v1] xstack=inputs=2:layout=0_0|0_h0[mosaic];\
>> >>
>> [mosaic]hwupload_cuda,scale_cuda=w=1280:h=720:format=yuv420p:force_original_aspect_ratio=decrease,hwdownload,format=yuv420p,pad=1280:720:(ow-iw)/2:(oh-ih)/2,hwupload_cuda[out0]"
>> >> \
>> >> -filter:a:0 "aresample=async=10000,volume=1.00" -c:a:0 ac3 -threads 
>> >> 2
>> >> -ac:a:0 6 -ar:a:0 48000 -b:a:0 384k \
>> >> -filter:a:1 "aresample=async=10000,volume=1.00" -c:a:1 ac3 -threads 
>> >> 2
>> >> -ac:a:1 6 -ar:a:1 48000 -b:a:1 384k \ -map "[out0]" -map "0:a:0" 
>> >> -map "1:a:0" \ -c:v h264_nvenc -b:v 6000k -minrate:v 6000k 
>> >> -maxrate:v 6000k -bufsize:v 12000k -a53cc 1 -tune ll -zerolatency 1 
>> >> -cbr 1 -forced-idr 1 -strict_gop 1 -threads 2 -profile:v high 
>> >> -level:v 4.2 -bf:v 0 -g:v 30 \ -f mpegts -muxrate
>> >> 8238520 -pes_payload_size 1528 "udp://@
>> >>
>> 225.105.0.37:10102?pkt_size=1316&bitrate=8238520&burst_bits=10528&ttl=64"
>> >>
>> >> If you run that command in Ubuntu 22.04 it works 100% fine and 
>> >> transcodes till the end of the input file(s).
>> >>
>> >> What doesn't work is if you start that process under systemd 
>> >> non-interactively like so:
>> >>
>> >> systemd-run -S
>> >>
>> >> Then run that same command it will now fail in a strange way.
>> >>
>> >> Note: It's important that you try to output to multicast, if I try 
>> >> the same command outputting to a file, it works fine (my guess is 
>> >> any network-based output exhibits this behavior).
>> >>
>> >> You will see logs like this:
>> >>
>> >> [Parsed_scale_cuda_1 @ 0x55da86a03340] w:1920 h:1080 fmt:nv12 -> 
>> >> w:768
>> >> h:432 fmt:nv12
>> >>
>> >> And the about 1-2 seconds before another log comes out.
>> >>
>> >> Eventually (after many stalls and logs) this log comes out and the 
>> >> transcode stops:
>> >>
>> >> [vost#0:0/h264_nvenc @ 0x55da86a3f780] Error submitting a packet to 
>> >> the
>> >> muxer: Cannot allocate memory
>> >>
>> >> I attached GDB to ffmpeg when it is stalled and its inside trying 
>> >> to compile a cuda script.
>> >>
>> >> If I'm not doing xstack (I'm pretty sure this has to do with 
>> >> multiple
>> >> inputs) nvidia does not stall.
>> >>
>> >> Does anyone have any idea what is happening here? I launch ffmpeg 
>> >> from a
>> >> c++ wrapper daemon, if that daemon is started via systemd, then 
>> >> c++ nvidia
>> >> multiple inputs fail. However, if I launch my daemon by hand at a 
>> >> terminal, it works fine.
>> >>
>> >> Thanks
>> >>
>>
>> > Paste the content of the systemd unit file here.
>> > Logs from the same (systemctl status unit-name.service) will also assist.
>> >That might help in understanding how and why the systemd unit is failing.
>>
>> systemd service file:
>>
>> [Unit]
>> Description=Transcoder Service
>> After=default.target
>> StartLimitInterval=0
>>
>> [Service]
>> Type=forking
>> ExecStart=/opt/bin/videotranscoder
>> Restart=always
>> RestartSec=15
>> TasksMax=infinity
>> LimitCORE=infinity
>>
>> [Install]
>> WantedBy=default.target
>>
>> Logs:
>>
>> * videotranscoder.service - Innovative Video Transcoder
>>      Loaded: loaded (/lib/systemd/system/videotranscoder.service;
>> disabled; vendor preset: enabled)
>>      Active: active (running) since Mon 2024-11-18 16:11:09 CST; 3min 
>> 19s ago
>>     Process: 50296 ExecStart=/opt/bin/videotranscoder (code=exited,
>> status=0/SUCCESS)
>>    Main PID: 50298 (videotranscoder)
>>       Tasks: 81
>>      Memory: 915.1M
>>         CPU: 1min 1.870s
>>      CGroup: /system.slice/videotranscoder.service
>>              |-50298 /opt/bin/videotranscoder
>>              |-50320 /bin/sh -c "/opt/bin/ffmpeg -y -threads 2 
>> -nostats -nostdin -loglevel verbose -progress pipe:1 -probesize 5M 
>> -filter_threads 4 -threads 2 -re -fflags +genpts -fflags 
>> discardcorrupt -hwaccel_device 3 -extra_hw_frames 2 -hwaccel cuda -h>
>>              `-50322 /opt/bin/ffmpeg -y -threads 2 -nostats -nostdin 
>> -loglevel verbose -progress pipe:1 -probesize 5M -filter_threads 4 
>> -threads
>> 2 -re -fflags +genpts -fflags discardcorrupt -hwaccel_device 3 
>> -extra_hw_frames 2 -hwaccel cuda -hwaccel_outpu>
>>
>> Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]:
>> FileTranscoder: [u:4,t:1,f:9f201704-a501-4e94-bce7-f3ac8e83a519.ts] 
>> Adding audio output: ac3, 6 channels, 384 kbps.
>> Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]:
>> FileTranscoder: [u:4,t:1,f:9f201704-a501-4e94-bce7-f3ac8e83a519.ts] 
>> Audio bitrate is 0, defaulting audio bitrate to 128k for aac.
>> Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]:
>> FileTranscoder: [u:4,t:1,f:9f201704-a501-4e94-bce7-f3ac8e83a519.ts] 
>> Adding audio output: aac, 2 channels, 128 kbps.
>> Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]:
>> FileTranscoder: transcode ffmpeg cmd (starting): ffmpeg -hide_banner 
>> -y -nostats -hwaccel_device 1 -hwaccel cuvid  -i 
>> /video/vod/in/9f201704-a501-4e94-bce7-f3ac8e83a519.ts -filter_complex 
>> "hw> Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [fifo @ 
>> 0x55c5facc4840] Recovery attempt #1 Nov 18 16:14:28 encoder10029unit4 
>> videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [mpegts @ 
>> 0x55c5f6bd2900] service 1 using PCR in pid=256, pcr_period=20ms
>>                                                                 
>> [mpegts @ 0x55c5f6bd2900] muxrate 8238520, Nov 18 16:14:28 
>> encoder10029unit4 videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] sdt every 500 ms, 
>> pat/pmt every 100 ms Nov 18 16:14:28 encoder10029unit4 
>> videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [fifo @ 
>> 0x55c5facc4840] Recovery successful Nov 18 16:14:28 encoder10029unit4 
>> videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [fifo @ 
>> 0x55c5facc4840] FIFO queue flushed Nov 18 16:14:28 encoder10029unit4 
>> videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [AVIOContext @ 
>> 0x7fa8b4014300] Statistics: 5395788 bytes written, 0 seeks, 4657 
>> writeouts Nov 18 16:14:29 encoder10029unit4 videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [fifo @ 
>> 0x55c5facc4840] FIFO queue full
>>

> I see the problem.
>
> Your output is emulating CBR over mpegts, but it's overshooting.
> Lower your buffersize to about 5*(bitrate/fps). Assuming a frame rate of 30 fps, use -bufsize:v 1000 or thereabouts.

First, thanks for that info, I was never quite sure what buffersize was correct. However, after changing to use that buffersize I get the same behavior.

The key thing is I ran this under systemd-run for a reason. I was trying to show the simplest way to make this happen. I'm running under a stock Ubuntu 22.04 using Cuda 12.4 and the latest stable nvidia driver. If anyone has a nvidia compiled ffmpeg and ubuntu 22.04 with a nvidia card this will fail for them too. 

I'm baffled why starting it from an interactive terminal (ssh or directly on a connected keyboard/monitor) it works fine, but if I start it from systemd-run or if it's started by a systemd script (like on a reboot or package install) it exhibits this behavior.


More information about the ffmpeg-user mailing list