[FFmpeg-devel] [PATCH] CUDA - make it work for multiple GPU architectures

Patrick Ecord pecord at gmail.com
Fri Mar 12 17:14:52 EET 2021


Hello, 

My friend was running into issues trying to compile ffpmeg with cuda support so I tried to replicate the issue on machine with my 1070.

Started by following Nvidia’s guide for compiling with CUDA support -
https://docs.nvidia.com/video-technologies/video-codec-sdk/ffmpeg-with-nvidia-gpu/

It uses the wrong flag (`-–enable-cuda-sdk` instead of `--enable-cuda-nvcc`) got that figured out.

Then when I tried to run ./configure with the right flag I got `nvcc fatal : Unsupported gpu architecture 'compute_30'`

Googled that and found this github issue where one person suggested changing the `nvccflags_default` flags and they said - "I went with 75 because I'm on Turing architecture”
https://github.com/NVIDIA/cuda-samples/issues/46 

Started looking around for what flags I would want to use and I found this webpage that listed what cards were supported by which CUDA versions.
https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

I had just installed nvcc off Nvidia’s site and it came with CUDA 11 and there was a section that had flags for CUDA 11 with compatibility for "V100 and T4 Turing cards, but also support newer RTX 3080 and other Ampere cards”.

Also according to that person’s site a lot of the older cards got dropped with CUDA 8, 9, 10 and now 11 these flags should cover Maxwell and up

```
-arch=sm_52 \ 
-gencode=arch=compute_52,code=sm_52 \ 
-gencode=arch=compute_60,code=sm_60 \     
-gencode=arch=compute_61,code=sm_61 \ 
-gencode=arch=compute_70,code=sm_70 \ 
-gencode=arch=compute_75,code=sm_75 \
-gencode=arch=compute_80,code=sm_80 \
-gencode=arch=compute_86,code=sm_86 \
-gencode=arch=compute_86,code=compute_86
```

Tried that and ran configure and it failed with Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures"

So I removed the `-ptx` flag and I was able to run configure and make and make install without any errors.

Tested by converting Big Buck Bunny and it played fine.
ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i ./Big_Buck_Bunny_1080_10s_30MB.mp4 -c:a copy -c:v h264_nvenc -b:v 5M output.mp4

Other stuff - 
I am not really a CUDA expert so I am not sure if this is the "correct" way so let me know if there is a better way of doing it.
I haven't tried timing it to see if there is a slow down from supporting multiple architectures and not using the -ptx flag.
I saw there were also flags for clang, haven't tried messing with that yet my understanding is you can pass the flag multiple times.
"You can pass --cuda-gpu-arch multiple times to compile for multiple archs." - https://llvm.org/docs/CompileCudaWithLLVM.html

Wanted to send what I had and see what you all think, 
Thanks

---
configure | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index d11942fced..d9e4eff592 100755
--- a/configure
+++ b/configure
@@ -4344,7 +4344,7 @@ fi

if enabled cuda_nvcc; then
    nvcc_default="nvcc"
-    nvccflags_default="-gencode arch=compute_30,code=sm_30 -O2"
+    nvccflags_default="-arch=sm_52 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86"
else
    nvcc_default="clang"
    nvccflags_default="--cuda-gpu-arch=sm_30 -O2"
@@ -6240,7 +6240,7 @@ else
fi

if enabled cuda_nvcc; then
-    nvccflags="$nvccflags -ptx"
+    nvccflags="$nvccflags"
else
    nvccflags="$nvccflags -S -nocudalib -nocudainc --cuda-device-only -Wno-c++11-narrowing -include ${source_link}/compat/cuda/cuda_runtime.h"
    check_nvcc cuda_llvm
-- 
2.29.2



More information about the ffmpeg-devel mailing list