[FFmpeg-devel] [PATCH] avfilter: add vf_overlay_cuda

Dennis Mungai dmngaie at gmail.com
Wed Apr 1 16:50:31 EEST 2020


On Wed, 1 Apr 2020 at 16:43, Alex <3.14pi at ukr.net> wrote:

> Hi!Is it working? I try everything but constantly get error from
> overlay_cuda:
>
>
> ffmpeg -y -init_hw_device cuda=cuda -filter_hw_device cuda -hwaccel cuvid
> -c:v h264_cuvid -resize 1920x1080 -i 720p.mp4 -i watermark.png
> -filter_complex
> "[1:v]format=nv12,hwupload[img];[0:v][img]overlay_cuda=x=50:y=800[out]"
> -map [out] -c:v h264_nvenc -b:v 6M -an -preset fast  -y
> out_nvenc_overlay.mp4
> ...
> ffmpeg version git-2020-04-01-afa5e38
> ...
> [h264_cuvid @ 000001dd1b356d00] CUVID capabilities for h264_cuvid:
> [h264_cuvid @ 000001dd1b356d00] 8 bit: supported: 1, min_width: 48,
> max_width: 4096, min_height: 16, max_height: 4096
> [h264_cuvid @ 000001dd1b356d00] 10 bit: supported: 0, min_width: 0,
> max_width: 0, min_height: 0, max_height: 0
> [h264_cuvid @ 000001dd1b356d00] 12 bit: supported: 0, min_width: 0,
> max_width: 0, min_height: 0, max_height: 0
> Stream mapping:
>   Stream #0:0 (h264_cuvid) -> overlay_cuda:main
>   Stream #1:0 (png) -> format
>   overlay_cuda -> Stream #0:0 (h264_nvenc)
> Press [q] to stop, [?] for help
> [h264_cuvid @ 000001dd1b356d00] Formats: Original: cuda | HW: cuda | SW:
> nv12
> [graph 0 input from stream 1:0 @ 000001dd2e84a100] w:1894 h:302
> pixfmt:rgba tb:1/25 fr:25/1 sar:11811/11811
> [graph 0 input from stream 0:0 @ 000001dd2e84ae00] w:1920 h:1080
> pixfmt:cuda tb:1/24000 fr:24000/1001 sar:1/1
> [auto_scaler_0 @ 000001dd2ebf4cc0] w:iw h:ih flags:'bilinear' interl:0
> [Parsed_format_0 @ 000001dd2e849780] auto-inserting filter 'auto_scaler_0'
> between the filter 'graph 0 input from stream 1:0' and the filter
> 'Parsed_format_0'
> [auto_scaler_0 @ 000001dd2ebf4cc0] w:1894 h:302 fmt:rgba sar:11811/11811
> -> w:1894 h:302 fmt:nv12 sar:1/1 flags:0x2
> [overlay_cuda @ 000001dd2ebc87c0] cu->cuModuleLoadData(&ctx->cu_module,
> vf_overlay_cuda_ptx) failed -> CUDA_ERROR_INVALID_IMAGE: device kernel
> image is invalid
> [Parsed_overlay_cuda_2 @ 000001dd2e84b6c0] Failed to configure output pad
> on Parsed_overlay_cuda_2
> Error reinitializing filters!
> Failed to inject frame into filter network: Generic error in an external
> library
> Error while processing the decoded data for stream #0:0
> ...
>
>
>
> --- Original message ---
> From: "Yaroslav Pogrebnyak" <yyyaroslav at gmail.com>
> Date: 18 March 2020, 09:29:15
>
> This patch adds 'vf_overlay_cuda' filter.
> It draws one picture on top of another on CUDA GPU.
> For the end-user, it's similar to 'vf_overlay_opencl' and other overlay
> filters.
>
> This filter would be especially useful for building video processing
> pipelines that execute fully on the CUDA GPU. For example, the following
> pipeline would be possible: decode -> scale -> overlay -> encode, without
> copying frames between CPU and GPU in between.
>
> Technical details.
>
> Supported sw input formats are NV12 and YUV420P for main input, and NV12,
> YUV420P and YUVA420P for overlay input.
> Main and overlay sw formats should match (i.e, overlaying YUVA420P on NV12
> is not implemented).
> All pixel format conversions are needed to be done with 'format' or
> 'scale_npp' filters before 'overlay_cuda'.
>
> It was needed to slightly modify 'hwcontext_cuda.c' to allow overlays with
> alpha channel:
>  - Allow AV_PIX_FMT_YUVA420P to enable hwuploading frames with alpha
> channel to GPU.
>  - Do not shift Height of 4rd plane (alpha) when uploading to GPU.
>
> Examples.
>
> - Overlay picture on top of video (main: YUVJ420P->NV12, overlay: NV12)
> $ ffmpeg -y -init_hw_device cuda=cuda -filter_hw_device cuda -hwaccel
> cuvid \
>   -c:v h264_cuvid -i main.mp4 \
>   -i ~/overlay.jpg \
>   -filter_complex "[1:v]format=nv12, hwupload[overlay],
> [0:v][overlay]overlay_cuda=x=0:y=0:shortest=false" \
>   -an -c:v h264_nvenc -b:v 5M output.mp4
>
> - Overlay one video on top of another (main: NV12, overlay: NV12)
> $ ffmpeg -y \
>   -hwaccel cuvid -c:v h264_cuvid -i main.mp4 \
>   -hwaccel cuvid -c:v h264_cuvid -i overlay.mp4 \
>   -filter_complex "[1:v]scale_npp=512:-1[o],
> [v:0][o]overlay_cuda=x=100:y=100:shortest=true" \
>   -an -c:v h264_nvenc -b:v 5M output.mp4
>
> - Overlay picture with alpha channel on top of video (main: NV12->YUV420P,
> overlay: RGBA->YUVA420P)
> $ ffmpeg -y \
>   -init_hw_device cuda=cuda -filter_hw_device cuda -hwaccel cuvid \
>   -c:v h264_cuvid -i ~/main.mp4 \
>   -i ~/overlay.png \
>   -filter_complex "[1:v]format=yuva420p, hwupload[o],
> [v:0]scale_npp=format=yuv420p[m],
> [m][o]overlay_cuda=x=0:y=0:shortest=false" \
>   -an -c:v h264_nvenc -b:v 5M output.mp4
>
> Patch attached.
>
> P.S. This is my first patch, I would be grateful for any feedback to know
> if I'm doing things correctly or not.
> Thanks!
>
>
> Signed-off-by: Yaroslav Pogrebnyak <yyyaroslav at gmail.com>
> ---
>  configure                      |   2 +
>  libavfilter/Makefile           |   1 +
>  libavfilter/allfilters.c       |   1 +
>  libavfilter/vf_overlay_cuda.c  | 451 +++++++++++++++++++++++++++++++++
>  libavfilter/vf_overlay_cuda.cu |  54 ++++
>  libavutil/hwcontext_cuda.c     |   3 +-
>  6 files changed, 511 insertions(+), 1 deletion(-)
>  create mode 100644 libavfilter/vf_overlay_cuda.c
>  create mode 100644 libavfilter/vf_overlay_cuda.cu
>
>
>
>

How does the NVDEC path work out?

Try this:

ffmpeg -y -init_hw_device cuda=cuda -filter_hw_device cuda -hwaccel cuda
--hwaccel_output_format cuda -i 720p.mp4 -i watermark.png -filter_complex
"[1:v]format=nv12,hwupload[img];[0:v][img]overlay_cuda=x=50:y=800[out]"
-map [out] -c:v h264_nvenc -b:v 6M -an -preset fast  -y
out_nvenc_overlay.mp4


More information about the ffmpeg-devel mailing list