[FFmpeg-devel] [PATCH]: Change Stack Frame Limit in Cuda Context

Sat Jan 27 01:10:11 EET 2018

On 26/01/18 20:51, Ben Chang wrote:
> On Fri, Jan 26, 2018 at 3:32 AM, Mark Thompson <sw at jkqxz.net> wrote:
> 
>> On 26/01/18 09:06, Ben Chang wrote:
>>> Thanks for the review Mark.
>>>

To clarify, since it is less clear now with the trimmed context: my two comments about this change are completely independent.  (Given that, maybe it should be split into two parts - one for hwcontext and one for nvenc?)

This part is about the change to NVENC:

>>>  There are some cuda kernels in the driver that may be invoked depending
>> on
>>> the nvenc operations specified in the commandline. My observation from
>>> looking at the nvcc statistics is that most stack frame size for these
>> cuda
>>> kernels are 0 (highest observed was 120 bytes).
>>
>> Right, that makes sense.  If Nvidia is happy that this will always work in
>> drivers compatible with this API version (including any future ones) then
>> sure.
>>
> I am not saying this should be the "permanent" value for stack frame size
> per GPU thread. However, at this moment (looking at existing cuda kernels
> that devs have control over), I do not see this reduction as an issue.

I think you should be confident that the chosen value here will last well into the future for NVENC use.  Consider that this will end up in releases - if a future Nvidia driver update happens to need a larger stack then all previous releases and binaries will stop working for all users.

This part is about the change to the hwcontext device creation:

>>>> This is technically a user-visible change, since it will apply to all
>> user
>>>> programs run on the CUDA context created here as well as those inside
>>>> ffmpeg.  I'm not sure how many people actually use that, though, so
>> maybe
>>>> it won't affect anyone.
>>>>
>>> In ffmpeg, I see vf_thumbnail_cuda and vf_scale_cuda available (not sure
>> if
>>> there is more, but these two should not be affected by this reduction).
>>> User can always raise the stack limit size if their own custom kernel
>>> require higher stack frame size.
>>
>> I don't mean filters inside ffmpeg, I mean a user program which probably
>> uses NVDEC and/or NVENC (and possibly other things) from libavcodec but
>> then does its own CUDA processing with the same context.  This is silently
>> changing the setup underneath it, and 128 feels like a very small number.
>>
> Yes, this is really a trade off between reducing memory usage (since there
> are numerous complaints of high memory usage preventing having more ffmpeg
> instances) and user convenience (custom cuda implementation may be
> impacted). My thought (which can be wrong) is that users who implement
> their own cuda kernel may have better knowledge about cuda (eg. how much
> stack frame size their kernel needs or use cuda debugger to find out what
> issue they may have). The size of the kernels are really implementation
> dependent (eg, allocating arrays in stacks or heap, recursions, how much
> register spills, etc) so stack frame sizes may vary widely. The default,
> 1024 bytes, may not be enough at times and user will need to adjust the
> stack limit accordingly anyway.

Note that since you are changing a library the users in this context are all ffmpeg library users.  So, it means any program or library which uses ffmpeg, and transitively anyone who uses them.  The end-user need not be informed about CUDA at all.

(What you've said also makes it sound like it can change by compiler version, but I guess such changes should be small.)

>>>>
>>>> If the stack limit is violated, what happens?  Will that be undefined
>>>> behaviour with random effects (crash / incorrect results), or is it
>> likely
>>>> to be caught at program compile/load-time?
>>>>
>>> Stack will likely overflow and kernel will terminate (though I have yet
>>> encounter this before).
>>
>> As long as the user gets a clear message that a stack overflow has
>> occurred so that they can realise that they need to raise the value then it
>> should be fine.
> 
> I believe you will see stack overflow if attached to cuda debugger. But the
> default error may just be kernel launch error/failure. This goes back to my
> opinion that cuda developer should figure this out relatively easy if they
> want to customize the cuda part of their program.

Ok, sure.  It's possibly unfortunate if a user who knows nothing about CUDA rebuilds a program from source with newer ffmpeg libraries including this change because something could stop working in an opaque way, but if it errors out then it won't silently give the wrong answer and should be diagnosable by the original developer.

Thanks,

- Mark