[FFmpeg-devel] [PATCH]: Change Stack Frame Limit in Cuda Context

Ben Chang benchang621 at gmail.com
Fri Jan 26 22:51:48 EET 2018

On Fri, Jan 26, 2018 at 3:32 AM, Mark Thompson <sw at jkqxz.net> wrote:

> On 26/01/18 09:06, Ben Chang wrote:
> > Thanks for the review Mark.
> >
> >  There are some cuda kernels in the driver that may be invoked depending
> on
> > the nvenc operations specified in the commandline. My observation from
> > looking at the nvcc statistics is that most stack frame size for these
> cuda
> > kernels are 0 (highest observed was 120 bytes).
> Right, that makes sense.  If Nvidia is happy that this will always work in
> drivers compatible with this API version (including any future ones) then
> sure.
I am not saying this should be the "permanent" value for stack frame size
per GPU thread. However, at this moment (looking at existing cuda kernels
that devs have control over), I do not see this reduction as an issue.

> >>
> >>
> >> This is technically a user-visible change, since it will apply to all
> user
> >> programs run on the CUDA context created here as well as those inside
> >> ffmpeg.  I'm not sure how many people actually use that, though, so
> maybe
> >> it won't affect anyone.
> >>
> > In ffmpeg, I see vf_thumbnail_cuda and vf_scale_cuda available (not sure
> if
> > there is more, but these two should not be affected by this reduction).
> > User can always raise the stack limit size if their own custom kernel
> > require higher stack frame size.
> I don't mean filters inside ffmpeg, I mean a user program which probably
> uses NVDEC and/or NVENC (and possibly other things) from libavcodec but
> then does its own CUDA processing with the same context.  This is silently
> changing the setup underneath it, and 128 feels like a very small number.
Yes, this is really a trade off between reducing memory usage (since there
are numerous complaints of high memory usage preventing having more ffmpeg
instances) and user convenience (custom cuda implementation may be
impacted). My thought (which can be wrong) is that users who implement
their own cuda kernel may have better knowledge about cuda (eg. how much
stack frame size their kernel needs or use cuda debugger to find out what
issue they may have). The size of the kernels are really implementation
dependent (eg, allocating arrays in stacks or heap, recursions, how much
register spills, etc) so stack frame sizes may vary widely. The default,
1024 bytes, may not be enough at times and user will need to adjust the
stack limit accordingly anyway.

> >>
> >> If the stack limit is violated, what happens?  Will that be undefined
> >> behaviour with random effects (crash / incorrect results), or is it
> likely
> >> to be caught at program compile/load-time?
> >>
> > Stack will likely overflow and kernel will terminate (though I have yet
> > encounter this before).
> As long as the user gets a clear message that a stack overflow has
> occurred so that they can realise that they need to raise the value then it
> should be fine.

I believe you will see stack overflow if attached to cuda debugger. But the
default error may just be kernel launch error/failure. This goes back to my
opinion that cuda developer should figure this out relatively easy if they
want to customize the cuda part of their program.

Copying Timo's comment from another thread to consolidate discussion.
>>Wouldn't it affect potential future CUDA filters, which might make more
use of the stack?
If nvidia introduces a new kernel that exceed this limit, changes will need
to be made (but I do not think is anytime soon).


More information about the ffmpeg-devel mailing list