[FFmpeg-devel] [PATCH] Added the possibility to pass an externally created CUDA context to libavutil/hwcontext.c/av_hwdevice_ctx_create() for decoding with NVDEC

Oscar Amoros Huguet oamoros at mediapro.tv
Mon May 7 15:04:55 EEST 2018


Hi!

Even if there is need to have a syncronization before leaving the ffmpeg call, callin cuMemcpyAsync will allow the copies to overlap with any other task on the gpu, that was enqueued using any other non-blocking cuda stream. That’s exactly what we want to achieve.

This would benefit automatically any other app that uses non-blocking cuda streams, as independent cuda workflows.

Oscar

Enviat des del meu iPhone

El 7 maig 2018, a les 13:54, Timo Rothenpieler <timo at rothenpieler.org> va escriure:

>>> Additionally, could you give your opinion on the feature we also may
> want to add in the future, that we mentioned in the previous email?
> Basically, we may want to add one more CUDA function, specifically
> cuMemcpy2DAsync, and the possibility to set a CUStream in
> AVCUDADeviceContext, so it is used with cuMemcpy2DAsync instead of
> cuMemcpy2D in "nvdec_retrieve_data" in file libavcodec/nvdec.c. In our
> use case this would save up to  0.72 ms (GPU time) per frame, in case of
> decoding 8 fullhd frames, and up to 0.5 ms (GPU time) per frame, in case
> of decoding two 4k frames. This may sound too little, but for us is
> significant. Our software needs to do many things in a maximum of 33ms
> with CUDA on the GPU per frame, and we have little GPU time left.
>> 
>> This is interesting and I'm considering making that the default, as it
>> would fit well with the current infrastructure, delaying the sync call
>> to the moment the frame leaves avcodec, which with the internal
>> re-ordering and delay should give plenty of time for the copy to finish.
> 
> I'm not sure if/how well this works with the mapped cuvid frames though.
> The frame would already be unmapped and potentially re-used again before
> the async copy completes. So it would need an immediately call to Sync
> right after the 3 async copy calls, making the entire effort pointless.
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


More information about the ffmpeg-devel mailing list