[FFmpeg-user] Compiling ffmpeg with NVENC and NVIDIA GRID K1
Steven Liu
lingjiujianke at gmail.com
Thu Aug 27 09:08:46 CEST 2015
2015-08-27 14:52 GMT+08:00 Steven Liu <lingjiujianke at gmail.com>:
>
> 2015-06-29 18:12 GMT+08:00 Klaus Schürmann <ks at mediabeam.com>:
>
>> Hello,
>>
>> I compiled ffmpeg with nvenc support. The compile process worked without
>> any error. But if I try to convert a file with nvenc I got the error
>> message "[nvenc @ 0x39dc1c0] CreateInputBuffer failed".
>>
>> Can somebody help me to fix this problem?
>>
>> Best Regards
>> Klaus Schuermann
>>
>> OS: Ubuntu 14.04.2 LTS
>> NVidia driver: 346
>>
>> Her is the complete output oft he convert job:
>>
>> root at video-convert1:~/ffmpeg_sources/ffmpeg_libnvenc# ffmpeg -i
>> /media/testfile.mkv -r 60 -s 1024x768 -vcodec nvenc -b:v 5750k testfile.mp4
>> ffmpeg version N-73133-gd7e224e Copyright (c) 2000-2015 the FFmpeg
>> developers
>> built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04)
>> configuration: --prefix=/root/ffmpeg_build --pkg-config-flags=--static
>> --extra-cflags=-I/root/ffmpeg_build/include
>> --extra-ldflags=-L/root/ffmpeg_build/lib --bindir=/root/bin --enable-gpl
>> --enable-libass --enable-libfdk-aac --enable-libfreetype
>> --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis
>> --enable-libvpx --enable-libx264 --enable-libx265 --enable-nvenc
>> --enable-nonfree
>> libavutil 54. 27.100 / 54. 27.100
>> libavcodec 56. 44.101 / 56. 44.101
>> libavformat 56. 38.101 / 56. 38.101
>> libavdevice 56. 4.100 / 56. 4.100
>> libavfilter 5. 18.100 / 5. 18.100
>> libswscale 3. 1.101 / 3. 1.101
>> libswresample 1. 2.100 / 1. 2.100
>> libpostproc 53. 3.100 / 53. 3.100
>> Input #0, matroska,webm, from '/media/testfile.mkv':
>> Metadata:
>> encoder : libebml v1.3.0 + libmatroska v1.4.1
>> creation_time : 2014-09-29 00:31:12
>> Duration: 00:21:03.51, start: 0.000000, bitrate: 3015 kb/s
>> Stream #0:0(eng): Video: h264 (High), yuv420p(tv,
>> bt709/unknown/unknown), 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr,
>> 1k tbn, 47.95 tbc (default)
>> Stream #0:1: Audio: ac3, 48000 Hz, 5.1(side), fltp, 448 kb/s
>> (default) [nvenc @ 0x39dc1c0] CreateInputBuffer failed Output #0, mp4, to
>> 'testfile.mp4':
>> Metadata:
>> encoder : libebml v1.3.0 + libmatroska v1.4.1
>> Stream #0:0(eng): Video: h264, none, q=2-31, 128 kb/s, SAR 4:3 DAR
>> 0:0, 60 fps (default)
>> Metadata:
>> encoder : Lavc56.44.101 nvenc
>> Stream #0:1: Audio: aac, 0 channels, 128 kb/s (default)
>> Metadata:
>> encoder : Lavc56.44.101 libfdk_aac
>> Stream mapping:
>> Stream #0:0 -> #0:0 (h264 (native) -> h264 (nvenc))
>> Stream #0:1 -> #0:1 (ac3 (native) -> aac (libfdk_aac)) Error while
>> opening encoder for output stream #0:0 - maybe incorrect parameters such as
>> bit_rate, rate, width or height
>>
>> Output of devicequery:
>>
>> root at video-convert1:~#
>> NVIDIA_CUDA-7.0_Samples/1_Utilities/deviceQuery/deviceQuery
>> NVIDIA_CUDA-7.0_Samples/1_Utilities/deviceQuery/deviceQuery Starting...
>>
>> CUDA Device Query (Runtime API) version (CUDART static linking)
>>
>> Detected 4 CUDA Capable device(s)
>>
>> Device 0: "GRID K1"
>> CUDA Driver Version / Runtime Version 7.0 / 7.0
>> CUDA Capability Major/Minor version number: 3.0
>> Total amount of global memory: 4096 MBytes (4294770688
>> bytes)
>> ( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores
>> GPU Max Clock rate: 850 MHz (0.85 GHz)
>> Memory Clock rate: 891 Mhz
>> Memory Bus Width: 128-bit
>> L2 Cache Size: 262144 bytes
>> Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,
>> 65536), 3D=(4096, 4096, 4096)
>> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
>> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048
>> layers
>> Total amount of constant memory: 65536 bytes
>> Total amount of shared memory per block: 49152 bytes
>> Total number of registers available per block: 65536
>> Warp size: 32
>> Maximum number of threads per multiprocessor: 2048
>> Maximum number of threads per block: 1024
>> Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>> Max dimension size of a grid size (x,y,z): (2147483647, 65535,
>> 65535)
>> Maximum memory pitch: 2147483647 bytes
>> Texture alignment: 512 bytes
>> Concurrent copy and kernel execution: Yes with 1 copy engine(s)
>> Run time limit on kernels: No
>> Integrated GPU sharing Host Memory: No
>> Support host page-locked memory mapping: Yes
>> Alignment requirement for Surfaces: Yes
>> Device has ECC support: Disabled
>> Device supports Unified Addressing (UVA): Yes
>> Device PCI Domain ID / Bus ID / location ID: 0 / 132 / 0
>> Compute Mode:
>> < Default (multiple host threads can use ::cudaSetDevice() with
>> device simultaneously) >
>>
>> Device 1: "GRID K1"
>> CUDA Driver Version / Runtime Version 7.0 / 7.0
>> CUDA Capability Major/Minor version number: 3.0
>> Total amount of global memory: 4096 MBytes (4294770688
>> bytes)
>> ( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores
>> GPU Max Clock rate: 850 MHz (0.85 GHz)
>> Memory Clock rate: 891 Mhz
>> Memory Bus Width: 128-bit
>> L2 Cache Size: 262144 bytes
>> Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,
>> 65536), 3D=(4096, 4096, 4096)
>> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
>> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048
>> layers
>> Total amount of constant memory: 65536 bytes
>> Total amount of shared memory per block: 49152 bytes
>> Total number of registers available per block: 65536
>> Warp size: 32
>> Maximum number of threads per multiprocessor: 2048
>> Maximum number of threads per block: 1024
>> Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>> Max dimension size of a grid size (x,y,z): (2147483647, 65535,
>> 65535)
>> Maximum memory pitch: 2147483647 bytes
>> Texture alignment: 512 bytes
>> Concurrent copy and kernel execution: Yes with 1 copy engine(s)
>> Run time limit on kernels: No
>> Integrated GPU sharing Host Memory: No
>> Support host page-locked memory mapping: Yes
>> Alignment requirement for Surfaces: Yes
>> Device has ECC support: Disabled
>> Device supports Unified Addressing (UVA): Yes
>> Device PCI Domain ID / Bus ID / location ID: 0 / 133 / 0
>> Compute Mode:
>> < Default (multiple host threads can use ::cudaSetDevice() with
>> device simultaneously) >
>>
>> Device 2: "GRID K1"
>> CUDA Driver Version / Runtime Version 7.0 / 7.0
>> CUDA Capability Major/Minor version number: 3.0
>> Total amount of global memory: 4096 MBytes (4294770688
>> bytes)
>> ( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores
>> GPU Max Clock rate: 850 MHz (0.85 GHz)
>> Memory Clock rate: 891 Mhz
>> Memory Bus Width: 128-bit
>> L2 Cache Size: 262144 bytes
>> Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,
>> 65536), 3D=(4096, 4096, 4096)
>> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
>> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048
>> layers
>> Total amount of constant memory: 65536 bytes
>> Total amount of shared memory per block: 49152 bytes
>> Total number of registers available per block: 65536
>> Warp size: 32
>> Maximum number of threads per multiprocessor: 2048
>> Maximum number of threads per block: 1024
>> Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>> Max dimension size of a grid size (x,y,z): (2147483647, 65535,
>> 65535)
>> Maximum memory pitch: 2147483647 bytes
>> Texture alignment: 512 bytes
>> Concurrent copy and kernel execution: Yes with 1 copy engine(s)
>> Run time limit on kernels: No
>> Integrated GPU sharing Host Memory: No
>> Support host page-locked memory mapping: Yes
>> Alignment requirement for Surfaces: Yes
>> Device has ECC support: Disabled
>> Device supports Unified Addressing (UVA): Yes
>> Device PCI Domain ID / Bus ID / location ID: 0 / 134 / 0
>> Compute Mode:
>> < Default (multiple host threads can use ::cudaSetDevice() with
>> device simultaneously) >
>>
>> Device 3: "GRID K1"
>> CUDA Driver Version / Runtime Version 7.0 / 7.0
>> CUDA Capability Major/Minor version number: 3.0
>> Total amount of global memory: 4096 MBytes (4294770688
>> bytes)
>> ( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores
>> GPU Max Clock rate: 850 MHz (0.85 GHz)
>> Memory Clock rate: 891 Mhz
>> Memory Bus Width: 128-bit
>> L2 Cache Size: 262144 bytes
>> Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,
>> 65536), 3D=(4096, 4096, 4096)
>> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
>> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048
>> layers
>> Total amount of constant memory: 65536 bytes
>> Total amount of shared memory per block: 49152 bytes
>> Total number of registers available per block: 65536
>> Warp size: 32
>> Maximum number of threads per multiprocessor: 2048
>> Maximum number of threads per block: 1024
>> Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
>> Max dimension size of a grid size (x,y,z): (2147483647, 65535,
>> 65535)
>> Maximum memory pitch: 2147483647 bytes
>> Texture alignment: 512 bytes
>> Concurrent copy and kernel execution: Yes with 1 copy engine(s)
>> Run time limit on kernels: No
>> Integrated GPU sharing Host Memory: No
>> Support host page-locked memory mapping: Yes
>> Alignment requirement for Surfaces: Yes
>> Device has ECC support: Disabled
>> Device supports Unified Addressing (UVA): Yes
>> Device PCI Domain ID / Bus ID / location ID: 0 / 135 / 0
>> Compute Mode:
>> < Default (multiple host threads can use ::cudaSetDevice() with
>> device simultaneously) >
>> > Peer access from GRID K1 (GPU0) -> GRID K1 (GPU1) : Yes Peer access
>> > from GRID K1 (GPU0) -> GRID K1 (GPU2) : Yes Peer access from GRID K1
>> > (GPU0) -> GRID K1 (GPU3) : Yes Peer access from GRID K1 (GPU1) -> GRID
>> > K1 (GPU1) : No Peer access from GRID K1 (GPU1) -> GRID K1 (GPU2) : Yes
>> > Peer access from GRID K1 (GPU1) -> GRID K1 (GPU3) : Yes Peer access
>> > from GRID K1 (GPU2) -> GRID K1 (GPU1) : Yes Peer access from GRID K1
>> > (GPU2) -> GRID K1 (GPU2) : No Peer access from GRID K1 (GPU2) -> GRID
>> > K1 (GPU3) : Yes Peer access from GRID K1 (GPU1) -> GRID K1 (GPU0) :
>> > Yes Peer access from GRID K1 (GPU1) -> GRID K1 (GPU1) : No Peer access
>> > from GRID K1 (GPU1) -> GRID K1 (GPU2) : Yes Peer access from GRID K1
>> > (GPU2) -> GRID K1 (GPU0) : Yes Peer access from GRID K1 (GPU2) -> GRID
>> > K1 (GPU1) : Yes Peer access from GRID K1 (GPU2) -> GRID K1 (GPU2) : No
>> > Peer access from GRID K1 (GPU3) -> GRID K1 (GPU0) : Yes Peer access
>> > from GRID K1 (GPU3) -> GRID K1 (GPU1) : Yes Peer access from GRID K1
>> > (GPU3) -> GRID K1 (GPU2) : Yes
>>
>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA
>> Runtime Version = 7.0, NumDevs = 4, Device0 = GRID K1, Device1 = GRID K1,
>> Device2 = GRID K1, Device3 = GRID K1 Result = PASS
>>
>
> I have the same error message, What should we attention? my GPU pcie
> message bellow:
>
>
> [root at localhost release]# ./deviceQuery
> ./deviceQuery Starting...
>
> CUDA Device Query (Runtime API) version (CUDART static linking)
>
> Detected 1 CUDA Capable device(s)
>
> Device 0: "Tesla K20c"
> CUDA Driver Version / Runtime Version 7.0 / 7.0
> CUDA Capability Major/Minor version number: 3.5
> Total amount of global memory: 4800 MBytes (5032706048
> bytes)
> (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores
> GPU Max Clock rate: 706 MHz (0.71 GHz)
> Memory Clock rate: 2600 Mhz
> Memory Bus Width: 320-bit
> L2 Cache Size: 1310720 bytes
> Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,
> 65536), 3D=(4096, 4096, 4096)
> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048
> layers
> Total amount of constant memory: 65536 bytes
> Total amount of shared memory per block: 49152 bytes
> Total number of registers available per block: 65536
> Warp size: 32
> Maximum number of threads per multiprocessor: 2048
> Maximum number of threads per block: 1024
> Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
> Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
> Maximum memory pitch: 2147483647 bytes
> Texture alignment: 512 bytes
> Concurrent copy and kernel execution: Yes with 2 copy engine(s)
> Run time limit on kernels: No
> Integrated GPU sharing Host Memory: No
> Support host page-locked memory mapping: Yes
> Alignment requirement for Surfaces: Yes
> Device has ECC support: Enabled
> Device supports Unified Addressing (UVA): Yes
> Device PCI Domain ID / Bus ID / location ID: 0 / 4 / 0
> Compute Mode:
> < Default (multiple host threads can use ::cudaSetDevice() with
> device simultaneously) >
>
> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA Runtime
> Version = 7.0, NumDevs = 1, Device0 = Tesla K20c
> Result = PASS
>
>
Hi Timo,
I saw the status of 0x0A if header file /usr/include/nvEncodeAPI.h
perhaps the memory alloc is large?
/* 1MB is large enough to hold most output frames. NVENC increases this
automaticaly if it's not enough. */
allocOut.size = 1024 * 1024;
allocOut.memoryHeap = NV_ENC_MEMORY_HEAP_SYSMEM_CACHED;
/**
* This indicates that the API call failed because it was unable to
allocate
* enough memory to perform the requested operation.
*/
NV_ENC_ERR_OUT_OF_MEMORY,
More information about the ffmpeg-user
mailing list