[FFmpeg-devel] Sharing cuda context between transcode sessions to reduce initialization overhead

Mark Thompson sw at jkqxz.net
Tue Jun 13 01:48:50 EEST 2017



On 12/06/17 22:33, Hendrik Leppkes wrote:
> Am 12.06.2017 10:38 nachm. schrieb "Ganapathy Raman Kasi" <gkasi at nvidia.com
>> :
> 
> Hi,
> 
> 
> Currently incase of using 1 -> N transcode (1 SW decode -> N  NVENC
> encodes) without HW upload filter, we end up allocating multiple Cuda
> contexts for the N transcode sessions for the same underlying gpu device.
> This comes with the cuda context initialization overhead. (~100 ms per
> context creation with 4th gen i5 with GTX 1080 in ubuntu 16.04).  Also in
> case of  M * (1->N) full HW accelerated transcode we face this issue where
> the cuda context is not shared between the M transcode sessions. Sharing
> the context would greatly reduce the initialization time which will matter
> in case of short clip transcodes.
> 
> 
> I currently have a global array in avutil/hwcontext_cuda.c which keeps
> track of the cuda contexts created and reuses existing contexts when
> request for hwdevice ctx create occurs. This is shared in the attached
> patch. Please check the approach and let me know if there is better/cleaner
> way to do this. Thanks
> 
> 
> Global state in the libraries is something we absolutely try to stay away
> from, so this approach is not quite appropriate.
> 
> If you want to somehow share this, it should be in the ffmpeg command line
> tool somewhere, however we also try to reduce hardware specific magic in
> favor of abstractions

Using hwupload_cuda creates a new device out of nowhere in the middle of the graph, and you can't do anything to avoid that behaviour without nasty hackery in the libraries.  So, use generic hwupload instead, which will use a device provided by the user.

With the patch series just posted:

"ffmpeg ... -init_hw_device cuda=foo:bar -filter_hw_device foo -vf ...hwupload... -c:v nvenc... -vf ...hwupload... -c:v nvenc..."

It doesn't currently solve cases which require multiple devices in a single graph, though - thoughts definitely welcome on how to do that.


- Mark


More information about the ffmpeg-devel mailing list