[FFmpeg-user] Nvidia Transcoding: Failing Using xstack (When Running Under systemd)
Shane Warren
shanew at innovsys.com
Wed Nov 27 17:22:43 EET 2024
-----Original Message-----
From: ffmpeg-user <ffmpeg-user-bounces at ffmpeg.org> On Behalf Of Shane Warren
Sent: Tuesday, November 26, 2024 1:35 PM
To: FFmpeg user questions <ffmpeg-user at ffmpeg.org>
Subject: Re: [FFmpeg-user] Nvidia Transcoding: Failing Using xstack (When Running Under systemd)
On Tue, 19 Nov 2024, 01:20 Shane Warren, <shanew at innovsys.com> wrote:
>> On Mon, 18 Nov 2024, 11:33 pm Shane Warren, <shanew at innovsys.com> wrote:
>>
>> >> I have been trying to track down why when transcoding using xstack
>> >> with nvidia decoding and encoding I get strange decoding issues in
>> ffmpeg.
>> >>
>> >> Note: I use 2 1 minute long .ts files for this example if you want
>> >> my inputs, they are available here (as input1.ts and input2.ts) :
>> >>
>> >>
>> >> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F
>> >> d%2F&data=05%7C02%7Cshanew%40innovsys.com%7Cff7b74741b9c41f98cf708
>> >> dd08a85ad0%7C7a48ce45ee974a95ac183390878a179b%7C0%7C0%7C6386762413
>> >> 66957710%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwL
>> >> jAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C
>> >> %7C%7C&sdata=5D1lQZuI2nVEpdhDF9TUiWqEM%2FfCEjKH2JPkmDFDBHk%3D&rese
>> >> rved=0
>> >> riv%2F&data=05%7C02%7Cshanew%40innovsys.com%7Cc241556f6a2e4253d9bc
>> >> 0
>> >> 8dd0825844e%7C7a48ce45ee974a95ac183390878a179b%7C0%7C0%7C638675679
>> >> 4
>> >> 48993996%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwL
>> >> j
>> >> AuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%
>> >> 7
>> >> C%7C&sdata=hVnUflCd1pK6iadB%2FsXUiB1BPuSiPt%2F%2BW3FP8a%2BWDiI%3D&
>> >> r
>> >> eserved=0
>> >> e.google.com%2Fdrive%2Ffolders%2F1mZ8xiNvz5ez1ULlNsy5a3KhnhaqQ2Hgo
>> >> %
>> >> 3Fu
>> >> sp%3Ddrive_link&data=05%7C02%7Cshanew%40innovsys.com%7C02a2eccf16a
>> >> a
>> >> 494
>> >> 1b6c408dd08136cfd%7C7a48ce45ee974a95ac183390878a179b%7C0%7C0%7C638
>> >> 6
>> >> 756
>> >> 01721027151%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOi
>> >> I
>> >> wLj
>> >> AuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%
>> >> 7
>> >> C%7
>> >> C&sdata=H7Nk9G6qJ3jg17ApCn3iBkSDmN0Mz%2BX5QZzHnSHBnAQ%3D&reserved=
>> >> 0
>> >>
>> >> I got the latest ffmpeg and tried this command (xstacking 2 videos
>> >> into 1
>> >> output):
>> >>
>> >> ffmpeg -y -threads 2 -nostats -loglevel verbose -probesize 5M
>> >> -filter_threads 4 -threads 2 -re -fflags +genpts -fflags
>> >> discardcorrupt \ -extra_hw_frames 2 -hwaccel cuda
>> >> -hwaccel_output_format cuda -threads 2 -thread_queue_size 4096
>> >> -heavy_compr 1 -thread_queue_size 4096 -re -i input1.ts \
>> >> -extra_hw_frames 2 -hwaccel cuda -hwaccel_output_format cuda
>> >> -threads
>> >> 2 -thread_queue_size 4096 -heavy_compr 1 -thread_queue_size 4096
>> >> -re -i input2.ts \ -filter_complex "\
>> >> [0:v:0]yadif_cuda=deint=interlaced,scale_cuda=768:432,hwdownload,f
>> >> o
>> >> rma
>> >> t=nv12,fps=60000/1001[v0];
>> >> \
>> >> [1:v:0]yadif_cuda=deint=interlaced,scale_cuda=768:432,hwdownload,f
>> >> o
>> >> rma
>> >> t=nv12,fps=60000/1001[v1];
>> >> \
>> >> [v0][v1] xstack=inputs=2:layout=0_0|0_h0[mosaic];\
>> >>
>> [mosaic]hwupload_cuda,scale_cuda=w=1280:h=720:format=yuv420p:force_original_aspect_ratio=decrease,hwdownload,format=yuv420p,pad=1280:720:(ow-iw)/2:(oh-ih)/2,hwupload_cuda[out0]"
>> >> \
>> >> -filter:a:0 "aresample=async=10000,volume=1.00" -c:a:0 ac3
>> >> -threads
>> >> 2
>> >> -ac:a:0 6 -ar:a:0 48000 -b:a:0 384k \
>> >> -filter:a:1 "aresample=async=10000,volume=1.00" -c:a:1 ac3
>> >> -threads
>> >> 2
>> >> -ac:a:1 6 -ar:a:1 48000 -b:a:1 384k \ -map "[out0]" -map "0:a:0"
>> >> -map "1:a:0" \ -c:v h264_nvenc -b:v 6000k -minrate:v 6000k
>> >> -maxrate:v 6000k -bufsize:v 12000k -a53cc 1 -tune ll -zerolatency
>> >> 1 -cbr 1 -forced-idr 1 -strict_gop 1 -threads 2 -profile:v high
>> >> -level:v 4.2 -bf:v 0 -g:v 30 \ -f mpegts -muxrate
>> >> 8238520 -pes_payload_size 1528 "udp://@
>> >>
>> 225.105.0.37:10102?pkt_size=1316&bitrate=8238520&burst_bits=10528&ttl=64"
>> >>
>> >> If you run that command in Ubuntu 22.04 it works 100% fine and
>> >> transcodes till the end of the input file(s).
>> >>
>> >> What doesn't work is if you start that process under systemd
>> >> non-interactively like so:
>> >>
>> >> systemd-run -S
>> >>
>> >> Then run that same command it will now fail in a strange way.
>> >>
>> >> Note: It's important that you try to output to multicast, if I try
>> >> the same command outputting to a file, it works fine (my guess is
>> >> any network-based output exhibits this behavior).
>> >>
>> >> You will see logs like this:
>> >>
>> >> [Parsed_scale_cuda_1 @ 0x55da86a03340] w:1920 h:1080 fmt:nv12 ->
>> >> w:768
>> >> h:432 fmt:nv12
>> >>
>> >> And the about 1-2 seconds before another log comes out.
>> >>
>> >> Eventually (after many stalls and logs) this log comes out and the
>> >> transcode stops:
>> >>
>> >> [vost#0:0/h264_nvenc @ 0x55da86a3f780] Error submitting a packet
>> >> to the
>> >> muxer: Cannot allocate memory
>> >>
>> >> I attached GDB to ffmpeg when it is stalled and its inside trying
>> >> to compile a cuda script.
>> >>
>> >> If I'm not doing xstack (I'm pretty sure this has to do with
>> >> multiple
>> >> inputs) nvidia does not stall.
>> >>
>> >> Does anyone have any idea what is happening here? I launch ffmpeg
>> >> from a
>> >> c++ wrapper daemon, if that daemon is started via systemd, then
>> >> c++ nvidia
>> >> multiple inputs fail. However, if I launch my daemon by hand at a
>> >> terminal, it works fine.
>> >>
>> >> Thanks
>> >>
>>
>> > Paste the content of the systemd unit file here.
>> > Logs from the same (systemctl status unit-name.service) will also assist.
>> >That might help in understanding how and why the systemd unit is failing.
>>
>> systemd service file:
>>
>> [Unit]
>> Description=Transcoder Service
>> After=default.target
>> StartLimitInterval=0
>>
>> [Service]
>> Type=forking
>> ExecStart=/opt/bin/videotranscoder
>> Restart=always
>> RestartSec=15
>> TasksMax=infinity
>> LimitCORE=infinity
>>
>> [Install]
>> WantedBy=default.target
>>
>> Logs:
>>
>> * videotranscoder.service - Innovative Video Transcoder
>> Loaded: loaded (/lib/systemd/system/videotranscoder.service;
>> disabled; vendor preset: enabled)
>> Active: active (running) since Mon 2024-11-18 16:11:09 CST; 3min
>> 19s ago
>> Process: 50296 ExecStart=/opt/bin/videotranscoder (code=exited,
>> status=0/SUCCESS)
>> Main PID: 50298 (videotranscoder)
>> Tasks: 81
>> Memory: 915.1M
>> CPU: 1min 1.870s
>> CGroup: /system.slice/videotranscoder.service
>> |-50298 /opt/bin/videotranscoder
>> |-50320 /bin/sh -c "/opt/bin/ffmpeg -y -threads 2
>> -nostats -nostdin -loglevel verbose -progress pipe:1 -probesize 5M
>> -filter_threads 4 -threads 2 -re -fflags +genpts -fflags
>> discardcorrupt -hwaccel_device 3 -extra_hw_frames 2 -hwaccel cuda -h>
>> `-50322 /opt/bin/ffmpeg -y -threads 2 -nostats -nostdin
>> -loglevel verbose -progress pipe:1 -probesize 5M -filter_threads 4
>> -threads
>> 2 -re -fflags +genpts -fflags discardcorrupt -hwaccel_device 3
>> -extra_hw_frames 2 -hwaccel cuda -hwaccel_outpu>
>>
>> Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]:
>> FileTranscoder: [u:4,t:1,f:9f201704-a501-4e94-bce7-f3ac8e83a519.ts]
>> Adding audio output: ac3, 6 channels, 384 kbps.
>> Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]:
>> FileTranscoder: [u:4,t:1,f:9f201704-a501-4e94-bce7-f3ac8e83a519.ts]
>> Audio bitrate is 0, defaulting audio bitrate to 128k for aac.
>> Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]:
>> FileTranscoder: [u:4,t:1,f:9f201704-a501-4e94-bce7-f3ac8e83a519.ts]
>> Adding audio output: aac, 2 channels, 128 kbps.
>> Nov 18 16:14:26 encoder10029unit4 videotranscoder:50296[50298]:
>> FileTranscoder: transcode ffmpeg cmd (starting): ffmpeg -hide_banner
>> -y -nostats -hwaccel_device 1 -hwaccel cuvid -i
>> /video/vod/in/9f201704-a501-4e94-bce7-f3ac8e83a519.ts -filter_complex
>> "hw> Nov 18 16:14:28 encoder10029unit4 videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [fifo @
>> 0x55c5facc4840] Recovery attempt #1 Nov 18 16:14:28 encoder10029unit4
>> videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [mpegts @
>> 0x55c5f6bd2900] service 1 using PCR in pid=256, pcr_period=20ms
>>
>> [mpegts @ 0x55c5f6bd2900] muxrate 8238520, Nov 18 16:14:28
>> encoder10029unit4 videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] sdt every 500
>> ms, pat/pmt every 100 ms Nov 18 16:14:28 encoder10029unit4
>> videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [fifo @
>> 0x55c5facc4840] Recovery successful Nov 18 16:14:28 encoder10029unit4
>> videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [fifo @
>> 0x55c5facc4840] FIFO queue flushed Nov 18 16:14:28 encoder10029unit4
>> videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [AVIOContext @
>> 0x7fa8b4014300] Statistics: 5395788 bytes written, 0 seeks, 4657
>> writeouts Nov 18 16:14:29 encoder10029unit4 videotranscoder:50296[50298]:
>> VideoTranscodeApp: [u:4,t:3,p:1: 225.105.0.56:10102] [fifo @
>> 0x55c5facc4840] FIFO queue full
>>
>> I see the problem.
>>
>> Your output is emulating CBR over mpegts, but it's overshooting.
>> Lower your buffersize to about 5*(bitrate/fps). Assuming a frame rate of 30 fps, use -bufsize:v 1000 or thereabouts.
>First, thanks for that info, I was never quite sure what buffersize was correct. However, after changing to use that buffersize I get the same behavior.
>
>The key thing is I ran this under systemd-run for a reason. I was trying to show the simplest way to make this happen. I'm running under a stock Ubuntu 22.04 using Cuda 12.4 and the latest stable nvidia driver. If anyone has a nvidia compiled ffmpeg and ubuntu 22.04 with a nvidia card this will fail for them too.
>
>I'm baffled why starting it from an interactive terminal (ssh or directly on a connected keyboard/monitor) it works fine, but if I start it from systemd-run or if it's started by a systemd script (like on a reboot or package install) it exhibits this behavior.
> I have some more details on this, when this is stalled calling scale_cuda, I see it suck in this call stack:
>
> #0 0x00007f348129b4e0 in ?? () from /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
> #1 0x00007f34812444b8 in ?? () from /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
> #2 0x00007f3481047ce5 in ?? () from /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
> #3 0x00007f3480495ecb in ?? () from /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
> #4 0x00007f348049600b in ?? () from /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
> #5 0x00007f348045ff87 in ?? () from /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
> #6 0x00007f3480ff7d14 in ?? () from /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
> #7 0x00007f3480ff7daf in ?? () from /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
> #8 0x00007f34803331bc in ?? () from /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
> #9 0x00007f348033bcc1 in ?? () from /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
>#10 0x00007f3480340cd3 in ?? () from /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
> #11 0x00007f3480343c65 in ?? () from /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
> #12 0x00007f3480334789 in __cuda_CallJitEntryPoint () from /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
> #13 0x00007f359d166780 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
> #14 0x00007f359d15a507 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
> #15 0x00007f359cea6dc4 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
> #16 0x00007f359cec87e3 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
> #17 0x00007f359cdde904 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
> #18 0x00007f359cf13d4b in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
> #19 0x000055ab2bf90dd2 in ff_cuda_load_module (avctx=avctx at entry=0x55ab42468d00, hwctx=<optimized out>, cu_module=cu_module at entry=0x55ab42dfa310, data=<optimized out>, length=<optimized out>) at libavfilter/cuda/load_helper.c:90
> #20 0x000055ab2bc3d80e in cudascale_load_functions (ctx=0x55ab42468d00) at libavfilter/vf_scale_cuda.c:323
> #21 cudascale_config_props (outlink=<optimized out>) at libavfilter/vf_scale_cuda.c:393
>
> If I'm watching htop I see a single core (I have 40 cores) go to 100% cpu for maybe 5-10 seconds, other cores are idle. My theory is when doing scale_cuda in parallel across N inputs and running under systemd, the parallel instances share a core doing jit compile and are fighting over a single core.
>
> If I'm doing a single input stream, scale_cuda never goes for 5-10 seconds, and if I'm runing this command directly (not started from systemd) it also doesn't take 5-10 seconds for a scale_cuda call even when doing 6 inputs.
>
> I've tried messing with systemd config file options for my process (CPUAffinity wsa tried), nothing seems to stop this behavior.
>
> Any ideas here, I'm running out of things to try.
I finally have a fix for this, all cuda based filters work now, thanks go to Dennis Mungai for showing me how to fix this.
These parameters were added to the [Service] section of my systemd service script:
[Service]
Restart=always
User=root
LimitNOFILE=900000
TasksMax=900000
LimitNPROC=900000
Environment=CUDA_DEVICE_ORDER=PCI_BUS_ID
From Dennis's Notes:
(a). LimitNOFILE=900000 : Sets file open limits to a high value, ie 900000.
(b). TasksMax=900000 : Sets maximum number of concurrent tasks spawned per systemd unit to 900000.
(c). LimitNPROC=900000 : Sets the maximum number of allocable logical processor units to 900000, to prevent thread starvation.
(d). Environment=CUDA_DEVICE_ORDER=PCI_BUS_ID : This is a CUDA-specific environment variable CUDA_DEVICE_ORDER=PCI_BUS_ID which prohibits device re-ordering by random. This keeps the GPU device indexes consistent.
_______________________________________________
ffmpeg-user mailing list
ffmpeg-user at ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user
To unsubscribe, visit link above, or email ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-user mailing list
ffmpeg-user at ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user
To unsubscribe, visit link above, or email ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-user
mailing list