[FFmpeg-devel] [PATCH][DISCUSS] nvenc: Add encoder flush API.

Tue Nov 19 03:13:14 EET 2019

This patch is meant to be an entry point for discussion around an
issue we are having with flushing the nvenc encoder while doing
segmented transcoding. Hopefully there will be a less kludgey
workaround than this.

First, some background some info on where this is coming from. We do
segmented transcoding on Nvidia GPUs using the libav* libraries [0].
The flow is roughly this:

1. Segment incoming stream
2. Send each segment to a transcoder

We've noticed a significant overhead around setting up new transcode
sessions / contexts for each segment, and this overhead is magnified
the more streams a given machine is processing, regardless of the
number of attached GPUs [1].

Now, the logical solution here would be to reuse the GPU sessions
for segments during a given stream. However, there is a problem
around flushing internal decode / encode buffers. Because we do
segmented transcoding [2], we need to ensure that all stages in the
transcode pipeline are completely flushed in between each segment.

Here is what we do for each stage of decode, filter and encode:

* Decoding : Cache the first packet of each segment. When the
  IO layer EOFs, feed the cached packet with a sentinel pts of -1.
  (This doesn't seem to cause issues with h264_cuvid.) Once a frame
  is returned from the decoder with the sentinel pts set, we know
  the decoder is flushed of legitimate input. For a typical 2-second
  segment, this has typically added about 6 frames (~10%) of overhead
  which is tolerable because decoding is typically less expensive than
  encoding, No changes are required to FFmpeg itself, which is nice.

* Filtering : Close the filtergraph (via av_buffersrc_close) and re-
  initialize the filter with each segment. Again, the overhead here
  seems tolerable. Have not seen a straightforward way to drain the
  filtergraph without also closing or re-opening it.

* Encoding : This patch.

  We add a very special "av_nvenc_flush" API to signal end-of-stream
  in the same way as `avcodec_send_packet(ctx, NULL)` but bypassing
  all the higher-level libavcodec machinery before hitting nvenc.
  This seems to successfully drain pending frames. Afterwards,
  we can continue to send packets for the next segments via
  `avcodec_send_packet` and the internal state will more-or-less
  reinitialize as if nothing had happened.

  Now, it is quite likely that this behavior is entirely accidental,
  and should not be expected to be stable in the future.

  While the nvenc encoder itself does seem to be "resumable" according
  to the documentation around the `NV_ENC_FLAGS_EOS` flag (cf.
  NVIDIA Video Encoder API Programming Guide), FFmpeg has no such
  mode. So we've had to sort of inject one in here.

The questions here are:

* Are these workarounds reasonable for the problem of Nvidia GPU
  sessions taking a long time to initialize when transcoding under
  load?

* Is there an alternative to carrying around this patch to flush
  the encoder in between segments?

* If there is no alternative, would you be open to a more formalized
  addition to the avcodec API around "flushable" or "resumable"
  encoders?

Thanks for your thoughts!

Josh

[0] https://github.com/livepeer/lpms

[1] https://gist.github.com/j0sh/ae9e5a97e794e364a6dfe513fa2591c2

[2] For historical reasons we cannot easily change right now
---
 libavcodec/avcodec.h | 2 ++
 libavcodec/nvenc.c   | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index bcb931f0dd..763a557d82 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -6232,6 +6232,8 @@ const AVCodecDescriptor *avcodec_descriptor_get_by_name(const char *name);
  */
 AVCPBProperties *av_cpb_properties_alloc(size_t *size);
 
+int av_nvenc_flush(AVCodecContext *avctx);
+
 /**
  * @}
  */
diff --git a/libavcodec/nvenc.c b/libavcodec/nvenc.c
index 111048d043..36134fa6a9 100644
--- a/libavcodec/nvenc.c
+++ b/libavcodec/nvenc.c
@@ -2071,6 +2071,11 @@ static void reconfig_encoder(AVCodecContext *avctx, const AVFrame *frame)
     }
 }
 
+int attribute_align_arg av_nvenc_flush(AVCodecContext *avctx)
+{
+  return ff_nvenc_send_frame(avctx, NULL);
+}
+
 int ff_nvenc_send_frame(AVCodecContext *avctx, const AVFrame *frame)
 {
     NVENCSTATUS nv_status;
-- 
2.17.1