[FFmpeg-user] Nvidia Transcoding: Failing Using xstack (When Running Under systemd)

Dennis Mungai dmngaie at gmail.com
Tue Nov 19 01:05:05 EET 2024


On Tue, 19 Nov 2024, 01:20 Shane Warren, <shanew at innovsys.com> wrote:

> On Mon, 18 Nov 2024, 11:33 pm Shane Warren, <shanew at innovsys.com> wrote:
>
> >> I have been trying to track down why when transcoding using xstack
> >> with nvidia decoding and encoding I get strange decoding issues in
> ffmpeg.
> >>
> >> Note: I use 2 1 minute long .ts files for this example if you want my
> >> inputs, they are available here (as input1.ts and input2.ts) :
> >>
> >>
> >> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdriv
> >> e.google.com%2Fdrive%2Ffolders%2F1mZ8xiNvz5ez1ULlNsy5a3KhnhaqQ2Hgo%3Fu
> >> sp%3Ddrive_link&data=05%7C02%7Cshanew%40innovsys.com%7C02a2eccf16aa494
> >> 1b6c408dd08136cfd%7C7a48ce45ee974a95ac183390878a179b%7C0%7C0%7C6386756
> >> 01721027151%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLj
> >> AuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7
> >> C&sdata=H7Nk9G6qJ3jg17ApCn3iBkSDmN0Mz%2BX5QZzHnSHBnAQ%3D&reserved=0
> >>
> >> I got the latest ffmpeg and tried this command (xstacking 2 videos
> >> into 1
> >> output):
> >>
> >> ffmpeg -y -threads 2 -nostats -loglevel verbose -probesize 5M
> >> -filter_threads 4 -threads 2 -re -fflags +genpts -fflags
> >> discardcorrupt  \ -extra_hw_frames 2 -hwaccel cuda
> >> -hwaccel_output_format cuda -threads 2 -thread_queue_size 4096
> >> -heavy_compr 1 -thread_queue_size 4096 -re -i input1.ts \
> >> -extra_hw_frames 2 -hwaccel cuda -hwaccel_output_format cuda -threads
> >> 2 -thread_queue_size 4096 -heavy_compr 1 -thread_queue_size 4096 -re
> >> -i input2.ts \ -filter_complex "\
> >> [0:v:0]yadif_cuda=deint=interlaced,scale_cuda=768:432,hwdownload,forma
> >> t=nv12,fps=60000/1001[v0];
> >> \
> >> [1:v:0]yadif_cuda=deint=interlaced,scale_cuda=768:432,hwdownload,forma
> >> t=nv12,fps=60000/1001[v1];
> >> \
> >> [v0][v1] xstack=inputs=2:layout=0_0|0_h0[mosaic];\
> >>
> [mosaic]hwupload_cuda,scale_cuda=w=1280:h=720:format=yuv420p:force_original_aspect_ratio=decrease,hwdownload,format=yuv420p,pad=1280:720:(ow-iw)/2:(oh-ih)/2,hwupload_cuda[out0]"
> >> \
> >> -filter:a:0 "aresample=async=10000,volume=1.00" -c:a:0 ac3 -threads 2
> >> -ac:a:0 6 -ar:a:0 48000 -b:a:0 384k \
> >> -filter:a:1 "aresample=async=10000,volume=1.00" -c:a:1 ac3 -threads 2
> >> -ac:a:1 6 -ar:a:1 48000 -b:a:1 384k \
> >> -map "[out0]" -map "0:a:0" -map "1:a:0" \ -c:v h264_nvenc -b:v 6000k
> >> -minrate:v 6000k -maxrate:v 6000k -bufsize:v 12000k -a53cc 1 -tune ll
> >> -zerolatency 1 -cbr 1 -forced-idr 1 -strict_gop 1 -threads 2
> >> -profile:v high -level:v 4.2 -bf:v 0 -g:v 30 \ -f mpegts -muxrate
> >> 8238520 -pes_payload_size 1528 "udp://@
> >>
> 225.105.0.37:10102?pkt_size=1316&bitrate=8238520&burst_bits=10528&ttl=64"
> >>
> >> If you run that command in Ubuntu 22.04 it works 100% fine and
> >> transcodes till the end of the input file(s).
> >>
> >> What doesn't work is if you start that process under systemd
> >> non-interactively like so:
> >>
> >> systemd-run -S
> >>
> >> Then run that same command it will now fail in a strange way.
> >>
> >> Note: It's important that you try to output to multicast, if I try the
> >> same command outputting to a file, it works fine (my guess is any
> >> network-based output exhibits this behavior).
> >>
> >> You will see logs like this:
> >>
> >> [Parsed_scale_cuda_1 @ 0x55da86a03340] w:1920 h:1080 fmt:nv12 -> w:768
> >> h:432 fmt:nv12
> >>
> >> And the about 1-2 seconds before another log comes out.
> >>
> >> Eventually (after many stalls and logs) this log comes out and the
> >> transcode stops:
> >>
> >> [vost#0:0/h264_nvenc @ 0x55da86a3f780] Error submitting a packet to
> >> the
> >> muxer: Cannot allocate memory
> >>
> >> I attached GDB to ffmpeg when it is stalled and its inside trying to
> >> compile a cuda script.
> >>
> >> If I'm not doing xstack (I'm pretty sure this has to do with multiple
> >> inputs) nvidia does not stall.
> >>
> >> Does anyone have any idea what is happening here? I launch ffmpeg from
> >> a
> >> c++ wrapper daemon, if that daemon is started via systemd, then nvidia
> >> multiple inputs fail. However, if I launch my daemon by hand at a
> >> terminal, it works fine.
> >>
> >> Thanks
> >>
>
> > Paste the content of the systemd unit file here.
> > Logs from the same (systemctl status unit-name.service) will also assist.
> >That might help in understanding how and why the systemd unit is failing.
>
> systemd service file:
>
> [Unit]
> Description=Transcoder Service
> After=default.target
> StartLimitInterval=0
>
> [Service]
> Type=forking
> ExecStart=/opt/bin/videotranscoder
> Restart=always
> RestartSec=15
> TasksMax=infinity
> LimitCORE=infinity
>
> [Install]
> WantedBy=default.target
>
> Logs:
>
> * videotranscoder.service - Innovative Video Transcoder
>      Loaded: loaded (/lib/systemd/system/videotranscoder.service;
> disabled; vendor preset: enabled)
>      Active: active (running) since Mon 2024-11-18 16:11:09 CST; 3min 19s
> ago
>     Process: 50296 ExecStart=/opt/bin/videotranscoder (code=exited,
> status=0/SUCCESS)
>    Main PID: 50298 (videotranscoder)
>       Tasks: 81
>      Memory: 915.1M
>         CPU: 1min 1.870s
>      CGroup: /system.slice/videotranscoder.service
>              |-50298 /opt/bin/videotranscoder
>              |-50320 /bin/sh -c "/opt/bin/ffmpeg -y -threads 2 -nostats
> -nostdin -loglevel verbose -progress pipe:1 -probesize 5M -filter_threads 4
> -threads 2 -re -fflags +genpts -fflags discardcorrupt -hwaccel_device 3
> -extra_hw_frames 2 -hwaccel cuda -h>
>              `-50322 /opt/bin/ffmpeg -y -threads 2 -nostats -nostdin
> -loglevel verbose -progress pipe:1 -probesize 5M -filter_threads 4 -threads
> 2 -re -fflags +genpts -fflags discardcorrupt -hwaccel_device 3
> -extra_hw_frames 2 -hwaccel cuda -hwaccel_outpu>
>
> Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]:
> FileTranscoder: [u:4,t:1,f:9f201704-a501-4e94-bce7-f3ac8e83a519.ts] Adding
> audio output: ac3, 6 channels, 384 kbps.
> Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]:
> FileTranscoder: [u:4,t:1,f:9f201704-a501-4e94-bce7-f3ac8e83a519.ts] Audio
> bitrate is 0, defaulting audio bitrate to 128k for aac.
> Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]:
> FileTranscoder: [u:4,t:1,f:9f201704-a501-4e94-bce7-f3ac8e83a519.ts] Adding
> audio output: aac, 2 channels, 128 kbps.
> Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]:
> FileTranscoder: transcode ffmpeg cmd (starting): ffmpeg -hide_banner -y
> -nostats -hwaccel_device 1 -hwaccel cuvid  -i
> /video/vod/in/9f201704-a501-4e94-bce7-f3ac8e83a519.ts -filter_complex "hw>
> Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]:
> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [fifo @
> 0x55c5facc4840] Recovery attempt #1
> Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]:
> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [mpegts @
> 0x55c5f6bd2900] service 1 using PCR in pid=256, pcr_period=20ms
>                                                                 [mpegts @
> 0x55c5f6bd2900] muxrate 8238520,
> Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]:
> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] sdt every 500 ms,
> pat/pmt every 100 ms
> Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]:
> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [fifo @
> 0x55c5facc4840] Recovery successful
> Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]:
> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [fifo @
> 0x55c5facc4840] FIFO queue flushed
> Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]:
> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [AVIOContext @
> 0x7fa8b4014300] Statistics: 5395788 bytes written, 0 seeks, 4657 writeouts
> Nov 18 16:14:29 encoder10029unit4 videotranscoder:50296[50298]:
> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [fifo @
> 0x55c5facc4840] FIFO queue full
>

I see the problem.

Your output is emulating CBR over mpegts, but it's overshooting.
Lower your buffersize to about 5*(bitrate/fps). Assuming a frame rate of 30
fps, use -bufsize:v 1000 or thereabouts.

>


More information about the ffmpeg-user mailing list