[FFmpeg-user] Compiling ffmpeg with NVENC and NVIDIA GRID K1

Thu Aug 27 08:52:50 CEST 2015

2015-06-29 18:12 GMT+08:00 Klaus Schürmann <ks at mediabeam.com>:

> Hello,
>
> I compiled ffmpeg with nvenc support. The compile process worked without
> any error. But if I try to convert a file with nvenc I got the error
> message "[nvenc @ 0x39dc1c0] CreateInputBuffer failed".
>
> Can somebody help me to fix this problem?
>
> Best Regards
> Klaus Schuermann
>
> OS: Ubuntu 14.04.2 LTS
> NVidia driver: 346
>
> Her is the complete output oft he convert job:
>
> root at video-convert1:~/ffmpeg_sources/ffmpeg_libnvenc# ffmpeg -i
> /media/testfile.mkv -r 60 -s 1024x768 -vcodec nvenc -b:v 5750k testfile.mp4
> ffmpeg version N-73133-gd7e224e Copyright (c) 2000-2015 the FFmpeg
> developers
>   built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04)
>   configuration: --prefix=/root/ffmpeg_build --pkg-config-flags=--static
> --extra-cflags=-I/root/ffmpeg_build/include
> --extra-ldflags=-L/root/ffmpeg_build/lib --bindir=/root/bin --enable-gpl
> --enable-libass --enable-libfdk-aac --enable-libfreetype
> --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis
> --enable-libvpx --enable-libx264 --enable-libx265 --enable-nvenc
> --enable-nonfree
>   libavutil      54. 27.100 / 54. 27.100
>   libavcodec     56. 44.101 / 56. 44.101
>   libavformat    56. 38.101 / 56. 38.101
>   libavdevice    56.  4.100 / 56.  4.100
>   libavfilter     5. 18.100 /  5. 18.100
>   libswscale      3.  1.101 /  3.  1.101
>   libswresample   1.  2.100 /  1.  2.100
>   libpostproc    53.  3.100 / 53.  3.100
> Input #0, matroska,webm, from '/media/testfile.mkv':
>   Metadata:
>     encoder         : libebml v1.3.0 + libmatroska v1.4.1
>     creation_time   : 2014-09-29 00:31:12
>   Duration: 00:21:03.51, start: 0.000000, bitrate: 3015 kb/s
>     Stream #0:0(eng): Video: h264 (High), yuv420p(tv,
> bt709/unknown/unknown), 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr,
> 1k tbn, 47.95 tbc (default)
>     Stream #0:1: Audio: ac3, 48000 Hz, 5.1(side), fltp, 448 kb/s (default)
> [nvenc @ 0x39dc1c0] CreateInputBuffer failed Output #0, mp4, to
> 'testfile.mp4':
>   Metadata:
>     encoder         : libebml v1.3.0 + libmatroska v1.4.1
>     Stream #0:0(eng): Video: h264, none, q=2-31, 128 kb/s, SAR 4:3 DAR
> 0:0, 60 fps (default)
>     Metadata:
>       encoder         : Lavc56.44.101 nvenc
>     Stream #0:1: Audio: aac, 0 channels, 128 kb/s (default)
>     Metadata:
>       encoder         : Lavc56.44.101 libfdk_aac
> Stream mapping:
>   Stream #0:0 -> #0:0 (h264 (native) -> h264 (nvenc))
>   Stream #0:1 -> #0:1 (ac3 (native) -> aac (libfdk_aac)) Error while
> opening encoder for output stream #0:0 - maybe incorrect parameters such as
> bit_rate, rate, width or height
>
> Output of devicequery:
>
> root at video-convert1:~#
> NVIDIA_CUDA-7.0_Samples/1_Utilities/deviceQuery/deviceQuery
> NVIDIA_CUDA-7.0_Samples/1_Utilities/deviceQuery/deviceQuery Starting...
>
>  CUDA Device Query (Runtime API) version (CUDART static linking)
>
> Detected 4 CUDA Capable device(s)
>
> Device 0: "GRID K1"
>   CUDA Driver Version / Runtime Version          7.0 / 7.0
>   CUDA Capability Major/Minor version number:    3.0
>   Total amount of global memory:                 4096 MBytes (4294770688
> bytes)
>   ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
>   GPU Max Clock rate:                            850 MHz (0.85 GHz)
>   Memory Clock rate:                             891 Mhz
>   Memory Bus Width:                              128-bit
>   L2 Cache Size:                                 262144 bytes
>   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536,
> 65536), 3D=(4096, 4096, 4096)
>   Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
>   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048
> layers
>   Total amount of constant memory:               65536 bytes
>   Total amount of shared memory per block:       49152 bytes
>   Total number of registers available per block: 65536
>   Warp size:                                     32
>   Maximum number of threads per multiprocessor:  2048
>   Maximum number of threads per block:           1024
>   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>   Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
>   Maximum memory pitch:                          2147483647 bytes
>   Texture alignment:                             512 bytes
>   Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
>   Run time limit on kernels:                     No
>   Integrated GPU sharing Host Memory:            No
>   Support host page-locked memory mapping:       Yes
>   Alignment requirement for Surfaces:            Yes
>   Device has ECC support:                        Disabled
>   Device supports Unified Addressing (UVA):      Yes
>   Device PCI Domain ID / Bus ID / location ID:   0 / 132 / 0
>   Compute Mode:
>      < Default (multiple host threads can use ::cudaSetDevice() with
> device simultaneously) >
>
> Device 1: "GRID K1"
>   CUDA Driver Version / Runtime Version          7.0 / 7.0
>   CUDA Capability Major/Minor version number:    3.0
>   Total amount of global memory:                 4096 MBytes (4294770688
> bytes)
>   ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
>   GPU Max Clock rate:                            850 MHz (0.85 GHz)
>   Memory Clock rate:                             891 Mhz
>   Memory Bus Width:                              128-bit
>   L2 Cache Size:                                 262144 bytes
>   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536,
> 65536), 3D=(4096, 4096, 4096)
>   Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
>   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048
> layers
>   Total amount of constant memory:               65536 bytes
>   Total amount of shared memory per block:       49152 bytes
>   Total number of registers available per block: 65536
>   Warp size:                                     32
>   Maximum number of threads per multiprocessor:  2048
>   Maximum number of threads per block:           1024
>   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>   Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
>   Maximum memory pitch:                          2147483647 bytes
>   Texture alignment:                             512 bytes
>   Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
>   Run time limit on kernels:                     No
>   Integrated GPU sharing Host Memory:            No
>   Support host page-locked memory mapping:       Yes
>   Alignment requirement for Surfaces:            Yes
>   Device has ECC support:                        Disabled
>   Device supports Unified Addressing (UVA):      Yes
>   Device PCI Domain ID / Bus ID / location ID:   0 / 133 / 0
>   Compute Mode:
>      < Default (multiple host threads can use ::cudaSetDevice() with
> device simultaneously) >
>
> Device 2: "GRID K1"
>   CUDA Driver Version / Runtime Version          7.0 / 7.0
>   CUDA Capability Major/Minor version number:    3.0
>   Total amount of global memory:                 4096 MBytes (4294770688
> bytes)
>   ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
>   GPU Max Clock rate:                            850 MHz (0.85 GHz)
>   Memory Clock rate:                             891 Mhz
>   Memory Bus Width:                              128-bit
>   L2 Cache Size:                                 262144 bytes
>   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536,
> 65536), 3D=(4096, 4096, 4096)
>   Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
>   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048
> layers
>   Total amount of constant memory:               65536 bytes
>   Total amount of shared memory per block:       49152 bytes
>   Total number of registers available per block: 65536
>   Warp size:                                     32
>   Maximum number of threads per multiprocessor:  2048
>   Maximum number of threads per block:           1024
>   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>   Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
>   Maximum memory pitch:                          2147483647 bytes
>   Texture alignment:                             512 bytes
>   Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
>   Run time limit on kernels:                     No
>   Integrated GPU sharing Host Memory:            No
>   Support host page-locked memory mapping:       Yes
>   Alignment requirement for Surfaces:            Yes
>   Device has ECC support:                        Disabled
>   Device supports Unified Addressing (UVA):      Yes
>   Device PCI Domain ID / Bus ID / location ID:   0 / 134 / 0
>   Compute Mode:
>      < Default (multiple host threads can use ::cudaSetDevice() with
> device simultaneously) >
>
> Device 3: "GRID K1"
>   CUDA Driver Version / Runtime Version          7.0 / 7.0
>   CUDA Capability Major/Minor version number:    3.0
>   Total amount of global memory:                 4096 MBytes (4294770688
> bytes)
>   ( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
>   GPU Max Clock rate:                            850 MHz (0.85 GHz)
>   Memory Clock rate:                             891 Mhz
>   Memory Bus Width:                              128-bit
>   L2 Cache Size:                                 262144 bytes
>   Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536,
> 65536), 3D=(4096, 4096, 4096)
>   Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
>   Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048
> layers
>   Total amount of constant memory:               65536 bytes
>   Total amount of shared memory per block:       49152 bytes
>   Total number of registers available per block: 65536
>   Warp size:                                     32
>   Maximum number of threads per multiprocessor:  2048
>   Maximum number of threads per block:           1024
>   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>   Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
>   Maximum memory pitch:                          2147483647 bytes
>   Texture alignment:                             512 bytes
>   Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
>   Run time limit on kernels:                     No
>   Integrated GPU sharing Host Memory:            No
>   Support host page-locked memory mapping:       Yes
>   Alignment requirement for Surfaces:            Yes
>   Device has ECC support:                        Disabled
>   Device supports Unified Addressing (UVA):      Yes
>   Device PCI Domain ID / Bus ID / location ID:   0 / 135 / 0
>   Compute Mode:
>      < Default (multiple host threads can use ::cudaSetDevice() with
> device simultaneously) >
> > Peer access from GRID K1 (GPU0) -> GRID K1 (GPU1) : Yes Peer access
> > from GRID K1 (GPU0) -> GRID K1 (GPU2) : Yes Peer access from GRID K1
> > (GPU0) -> GRID K1 (GPU3) : Yes Peer access from GRID K1 (GPU1) -> GRID
> > K1 (GPU1) : No Peer access from GRID K1 (GPU1) -> GRID K1 (GPU2) : Yes
> > Peer access from GRID K1 (GPU1) -> GRID K1 (GPU3) : Yes Peer access
> > from GRID K1 (GPU2) -> GRID K1 (GPU1) : Yes Peer access from GRID K1
> > (GPU2) -> GRID K1 (GPU2) : No Peer access from GRID K1 (GPU2) -> GRID
> > K1 (GPU3) : Yes Peer access from GRID K1 (GPU1) -> GRID K1 (GPU0) :
> > Yes Peer access from GRID K1 (GPU1) -> GRID K1 (GPU1) : No Peer access
> > from GRID K1 (GPU1) -> GRID K1 (GPU2) : Yes Peer access from GRID K1
> > (GPU2) -> GRID K1 (GPU0) : Yes Peer access from GRID K1 (GPU2) -> GRID
> > K1 (GPU1) : Yes Peer access from GRID K1 (GPU2) -> GRID K1 (GPU2) : No
> > Peer access from GRID K1 (GPU3) -> GRID K1 (GPU0) : Yes Peer access
> > from GRID K1 (GPU3) -> GRID K1 (GPU1) : Yes Peer access from GRID K1
> > (GPU3) -> GRID K1 (GPU2) : Yes
>
> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA Runtime
> Version = 7.0, NumDevs = 4, Device0 = GRID K1, Device1 = GRID K1, Device2 =
> GRID K1, Device3 = GRID K1 Result = PASS
>

I have the same error message, What should we attention? my GPU pcie
message bellow:

[root at localhost release]# ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla K20c"
  CUDA Driver Version / Runtime Version          7.0 / 7.0
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 4800 MBytes (5032706048
bytes)
  (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
  GPU Max Clock rate:                            706 MHz (0.71 GHz)
  Memory Clock rate:                             2600 Mhz
  Memory Bus Width:                              320-bit
  L2 Cache Size:                                 1310720 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536,
65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048
layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device
simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA Runtime
Version = 7.0, NumDevs = 1, Device0 = Tesla K20c
Result = PASS