[FFmpeg-user] Nvidia Transcoding: Failing Using xstack (When Running Under systemd)

Shane Warren shanew at innovsys.com
Tue Nov 19 00:17:14 EET 2024

On Mon, 18 Nov 2024, 11:33 pm Shane Warren, <shanew at innovsys.com> wrote:

>> I have been trying to track down why when transcoding using xstack 
>> with nvidia decoding and encoding I get strange decoding issues in ffmpeg.
>> Note: I use 2 1 minute long .ts files for this example if you want my 
>> inputs, they are available here (as input1.ts and input2.ts) :
>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdriv
>> e.google.com%2Fdrive%2Ffolders%2F1mZ8xiNvz5ez1ULlNsy5a3KhnhaqQ2Hgo%3Fu
>> sp%3Ddrive_link&data=05%7C02%7Cshanew%40innovsys.com%7C02a2eccf16aa494
>> 1b6c408dd08136cfd%7C7a48ce45ee974a95ac183390878a179b%7C0%7C0%7C6386756
>> 01721027151%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLj
>> AuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7
>> C&sdata=H7Nk9G6qJ3jg17ApCn3iBkSDmN0Mz%2BX5QZzHnSHBnAQ%3D&reserved=0
>> I got the latest ffmpeg and tried this command (xstacking 2 videos 
>> into 1
>> output):
>> ffmpeg -y -threads 2 -nostats -loglevel verbose -probesize 5M 
>> -filter_threads 4 -threads 2 -re -fflags +genpts -fflags 
>> discardcorrupt  \ -extra_hw_frames 2 -hwaccel cuda 
>> -hwaccel_output_format cuda -threads 2 -thread_queue_size 4096 
>> -heavy_compr 1 -thread_queue_size 4096 -re -i input1.ts \ 
>> -extra_hw_frames 2 -hwaccel cuda -hwaccel_output_format cuda -threads 
>> 2 -thread_queue_size 4096 -heavy_compr 1 -thread_queue_size 4096 -re 
>> -i input2.ts \ -filter_complex "\ 
>> [0:v:0]yadif_cuda=deint=interlaced,scale_cuda=768:432,hwdownload,forma
>> t=nv12,fps=60000/1001[v0];
>> \
>> [1:v:0]yadif_cuda=deint=interlaced,scale_cuda=768:432,hwdownload,forma
>> t=nv12,fps=60000/1001[v1];
>> \
>> [v0][v1] xstack=inputs=2:layout=0_0|0_h0[mosaic];\
>> [mosaic]hwupload_cuda,scale_cuda=w=1280:h=720:format=yuv420p:force_original_aspect_ratio=decrease,hwdownload,format=yuv420p,pad=1280:720:(ow-iw)/2:(oh-ih)/2,hwupload_cuda[out0]"
>> \
>> -filter:a:0 "aresample=async=10000,volume=1.00" -c:a:0 ac3 -threads 2
>> -ac:a:0 6 -ar:a:0 48000 -b:a:0 384k \
>> -filter:a:1 "aresample=async=10000,volume=1.00" -c:a:1 ac3 -threads 2
>> -ac:a:1 6 -ar:a:1 48000 -b:a:1 384k \
>> -map "[out0]" -map "0:a:0" -map "1:a:0" \ -c:v h264_nvenc -b:v 6000k 
>> -minrate:v 6000k -maxrate:v 6000k -bufsize:v 12000k -a53cc 1 -tune ll 
>> -zerolatency 1 -cbr 1 -forced-idr 1 -strict_gop 1 -threads 2 
>> -profile:v high -level:v 4.2 -bf:v 0 -g:v 30 \ -f mpegts -muxrate 
>> 8238520 -pes_payload_size 1528 "udp://@ 
>> If you run that command in Ubuntu 22.04 it works 100% fine and 
>> transcodes till the end of the input file(s).
>> What doesn't work is if you start that process under systemd 
>> non-interactively like so:
>> systemd-run -S
>> Then run that same command it will now fail in a strange way.
>> Note: It's important that you try to output to multicast, if I try the 
>> same command outputting to a file, it works fine (my guess is any 
>> network-based output exhibits this behavior).
>> You will see logs like this:
>> [Parsed_scale_cuda_1 @ 0x55da86a03340] w:1920 h:1080 fmt:nv12 -> w:768
>> h:432 fmt:nv12
>> And the about 1-2 seconds before another log comes out.
>> Eventually (after many stalls and logs) this log comes out and the 
>> transcode stops:
>> [vost#0:0/h264_nvenc @ 0x55da86a3f780] Error submitting a packet to 
>> the
>> muxer: Cannot allocate memory
>> I attached GDB to ffmpeg when it is stalled and its inside trying to 
>> compile a cuda script.
>> If I'm not doing xstack (I'm pretty sure this has to do with multiple
>> inputs) nvidia does not stall.
>> Does anyone have any idea what is happening here? I launch ffmpeg from 
>> a
>> c++ wrapper daemon, if that daemon is started via systemd, then nvidia
>> multiple inputs fail. However, if I launch my daemon by hand at a 
>> terminal, it works fine.
>> Thanks

> Paste the content of the systemd unit file here.
> Logs from the same (systemctl status unit-name.service) will also assist.
>That might help in understanding how and why the systemd unit is failing.

systemd service file:

Description=Transcoder Service




* videotranscoder.service - Innovative Video Transcoder
     Loaded: loaded (/lib/systemd/system/videotranscoder.service; disabled; vendor preset: enabled)
     Active: active (running) since Mon 2024-11-18 16:11:09 CST; 3min 19s ago
    Process: 50296 ExecStart=/opt/bin/videotranscoder (code=exited, status=0/SUCCESS)
   Main PID: 50298 (videotranscoder)
      Tasks: 81
     Memory: 915.1M
        CPU: 1min 1.870s
     CGroup: /system.slice/videotranscoder.service
             |-50298 /opt/bin/videotranscoder
             |-50320 /bin/sh -c "/opt/bin/ffmpeg -y -threads 2 -nostats -nostdin -loglevel verbose -progress pipe:1 -probesize 5M -filter_threads 4 -threads 2 -re -fflags +genpts -fflags discardcorrupt -hwaccel_device 3 -extra_hw_frames 2 -hwaccel cuda -h>
             `-50322 /opt/bin/ffmpeg -y -threads 2 -nostats -nostdin -loglevel verbose -progress pipe:1 -probesize 5M -filter_threads 4 -threads 2 -re -fflags +genpts -fflags discardcorrupt -hwaccel_device 3 -extra_hw_frames 2 -hwaccel cuda -hwaccel_outpu>

Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]: FileTranscoder: [u:4,t:1,f:9f201704-a501-4e94-bce7-f3ac8e83a519.ts] Adding audio output: ac3, 6 channels, 384 kbps.
Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]: FileTranscoder: [u:4,t:1,f:9f201704-a501-4e94-bce7-f3ac8e83a519.ts] Audio bitrate is 0, defaulting audio bitrate to 128k for aac.
Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]: FileTranscoder: [u:4,t:1,f:9f201704-a501-4e94-bce7-f3ac8e83a519.ts] Adding audio output: aac, 2 channels, 128 kbps.
Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]: FileTranscoder: transcode ffmpeg cmd (starting): ffmpeg -hide_banner -y -nostats -hwaccel_device 1 -hwaccel cuvid  -i /video/vod/in/9f201704-a501-4e94-bce7-f3ac8e83a519.ts -filter_complex "hw>
Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]: VideoTranscodeApp: [u:4,t:3,p:1:] [fifo @ 0x55c5facc4840] Recovery attempt #1
Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]: VideoTranscodeApp: [u:4,t:3,p:1:] [mpegts @ 0x55c5f6bd2900] service 1 using PCR in pid=256, pcr_period=20ms
                                                                [mpegts @ 0x55c5f6bd2900] muxrate 8238520,
Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]: VideoTranscodeApp: [u:4,t:3,p:1:] sdt every 500 ms, pat/pmt every 100 ms
Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]: VideoTranscodeApp: [u:4,t:3,p:1:] [fifo @ 0x55c5facc4840] Recovery successful
Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]: VideoTranscodeApp: [u:4,t:3,p:1:] [fifo @ 0x55c5facc4840] FIFO queue flushed
Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]: VideoTranscodeApp: [u:4,t:3,p:1:] [AVIOContext @ 0x7fa8b4014300] Statistics: 5395788 bytes written, 0 seeks, 4657 writeouts
Nov 18 16:14:29 encoder10029unit4 videotranscoder:50296[50298]: VideoTranscodeApp: [u:4,t:3,p:1:] [fifo @ 0x55c5facc4840] FIFO queue full

