[FFmpeg-devel] [PATCH] Add NVENC encoder

Sat Nov 29 16:26:07 CET 2014

On 29 Nov 2014 03:45, Nicolas George <george at nsup.org> wrote:
>
> Le septidi 7 frimaire, an CCXXIII, Timo Rothenpieler a écrit : 
> > Done that primarily to keep things cleaned up and easier to read. 
> > Can as well put it all in one huge file. 
>
> IMHO, your choice in the end. 
>
> > Propably, will split that out when i get to it. 
>
> Thanks. 
>
> > Most of this code is ported from a C++ Project where everything had 
> > to be camel case, so those and the C++ malloc casts are some of the 
> > leftovers. 
>
> Idem, IMHO your choice. 
>
> > The maximum is the number of adjacent B Frames plus one. So 3 in the 
> > case of NVENC, unless they change the supported gop patterns. 
>
> In that case, I suspect to take some margin and use a hardcoded size for the 
> queue: with 8 or 16, there is plenty of time to update the size if nvidia 
> releases new GOP patterns. 
>
> > I basically need a fifo like structure, where i can queue output 
> > surfaces until NVENC is done filling them. 
> > An array doesn't realy reflect that usage, as new elements are added 
> > to the front, and taken from the back. 
>
> An array used as a circular buffer is perfect for that. 

For what it's worth, the implementation I did against the
nvidia patch was a circular buffer and it works fine. The
docs I read say 4 + num b frames and max b frames can 
go up to 16 - not 2.

> > Is a simple assert on the return value enough? Can't continue in a 
> > sane way anyway if it ever fails. 
>
> No: an assert can only be used when the error is not possible except for a 
> bug in the code. A malloc failure is always possible, so a proper error 
> handling, eventually returning ENOMEM, is necessary. That is annoying but 
> mostly unavoidable. 
>
> > I renamed it to enqueue/dequeue. 
>
> Thanks. 
>
> > This definitely can't be replaced, its purpose isn't just a plain 
> > list, but sorting of the input timestamps, so the dts is still 
> > monotonic after re-ordering for B frames. 
>
> As far as I can see, you do not need the sorting random access to the sorted 
> timestamps but only to be able to extract the lowest one. This is precisely 
> what a heap is very good at. I assure you, in this case a heap will be 
> almost as simple to implement (certainly simpler if you make it fixed-size) 
> and way more efficient. 

Yeah, you don't need to sort. Circular fifo works fine. 

> > Only adding the CUDA error code, which then has to be looked up manualy. 
>
> Too bad ;( 
>
> > What do you mean by that? Printing which presets are available in 
> > the error message? 
>
> Yes. Something like that is always nice: 
>
> # x264 [error]: invalid preset 'list' 
> # [libx264 @ 0x172a560] Error setting preset/tune list/(null). 
> # [libx264 @ 0x172a560] Possible presets: ultrafast superfast veryfast faster 
> # fast medium slow slower veryslow placebo 
> # [libx264 @ 0x172a560] Possible tunes: film animation grain stillimage psnr 
> # ssim fastdecode zerolatency 
>
> > Not entirely sure why i did that this way. Copied it straight from 
> > the libx264 encoder, without thinking too much about it. I can just 
> > set the profileGUID straight from that switch and can remove the 
> > second profile variable(which the libx264 encoder has in exactly the 
> > same conflicting way) entirely. 
>
> It is entirely possible that libx264 has reasons that I do not know to do 
> things that way. It is also possible that the options are there just for 
> historic reasons. 
>
> > >>+        default: 
> > >>+        avctx->coded_frame->pict_type = AV_PICTURE_TYPE_NONE; 
> > Not that I'm aware of. But i don't know what else to assume in this case. 
>
> Then I believe some kind of feedback would be needed. If it is really never 
> supposed to happen, an assert is acceptable; otherwise, an error requesting 
> a sample may be better. 
>
> > The maximum supported number of surfaces is allocated, if it'd ever 
> > run out, there'd be a bug in the code managing the surfaces. 
>
> Then an assert is exactly what is needed. 
>
> > ff_cuCtxCreate is a library function loaded from the CUDA dll/so. 
> > Same for all the other ff_cu* functions, there is no way to change 
> > what it returns, as it's not my function. 
>
> I missed that, sorry. 
>
> > >>+#define ifav_log(...) if (avctx) { av_log(__VA_ARGS__); } 
> > >Looks strange: why no error message when there is no context? 
> > Is it possible to call av_log without a context? 
>
> Yes, it just prints the message without the [libfoo @ 0x42] prefix. 
>
> > No, does it need to be? Can multiple threads create the coded at the 
> > same time? 
>
> It looks like it is called from encode_init, which can definitely be called 
> by multiple threads. 
>
> Thanks for your efforts. Hope this helps. 
>
> Regards, 
>
> -- 
>   Nicolas George
>
> _______________________________________________ 
> ffmpeg-devel mailing list 
> ffmpeg-devel at ffmpeg.org 
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel