[FFmpeg-devel] [PATCH 0/2] RFC: Generic hwaccel for cuvid v2

Sun Jun 25 14:43:12 EEST 2017

On 25/06/17 00:40, Philip Langdale wrote:
> Second try.
> 
> Feedback on first proposal was lack of interest in having default behaviour
> vary between hwaccels, despite their other differences.
> 
> In this patch series, I instead force the user to confront the change in
> command line semantics when doing cuvid/nvenc transcoding. They will only
> get full transcoding with an additional command line argument, so force
> them to specify that argument.
> 
> Philip Langdale (2):
>   ffmpeg: Switch cuvid to generic hwaccel
>   ffmpeg: Require output format when using cuvid hwaccel
> 
>  Makefile       |  1 -
>  ffmpeg.h       |  2 +-
>  ffmpeg_cuvid.c | 73 ----------------------------------------------------------
>  ffmpeg_opt.c   | 31 ++++++++++++++++++-------
>  4 files changed, 24 insertions(+), 83 deletions(-)
>  delete mode 100644 ffmpeg_cuvid.c

Can we take a step back here and consider what we actually want the result to be?

I think what we want to achieve is:

1)  CUDA devices passed to CUVID decoders as AVCodecContext.hw_device_ctx:
  a)  If we have exactly one device, always pass it.
  b)  If we have multiple devices, allow -hwaccel_device to select which one to pass (fail or make an arbitrary choice if not present?).
  c)  If none, either make a default one or pass nothing (depends what works).
* Note that this decision logic is completely unaffected by whether the dummy hwaccel is specified or not.)

2)  Output / get_format
  a)  If the dummy hwaccel is specified, output in hwaccel_output_format or default to CUDA frames if not specified.
  b)  Otherwise, always output in normal memory.
* Alternatively, actually change the semantics and be clear about what the new ones are.

3)  Remove all of the CUVID-specific code from ffmpeg.

Point (1) pretty much already works without any change - see hw_device_setup_for_decode().  The only missing feature here is that the name matching doesn't actually work for CUVID (strstr("h264_cuvid", "cuda") -> NULL).  I'm not sure that's actually a problem, though - the CUVID decoder is happy to pull a new device instance out of nothing, so it isn't actually needed in the standalone device case.  A hack there could give slightly more consistent behaviour, though.

(An imagined solution to avoid the string matching completely was to add a field to AVCodec which contains the set of devices that it can use, but that's a more invasive library change that noone has pursued yet.)

Point (2) is somewhat more subtle.  The default hwaccel behaviour is made for the real hwaccels attached to the normal decoder, and won't do the right thing for the dummy ones.  The specific case that we strongly want to avoid is some normal setting where the output is downloaded to normal memory from a hardware frame inside ffmpeg, because that is almost certainly done more efficiently in the decoder itself (for the CUVID case, it actually has two explicit copies if you do this).  It's rare that you ever want to specify anything other than the hardware format or the corresponding software format for decode (and in fact I think only VAAPI supports such convert-on-download cases anyway), so the single toggle is usually sufficient.

Therefore, I think we should just clearly distinguish between the two general behaviours for the hwaccel case - real and dummy.  That's essentially what your patch at <https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212566.html> did, but in a slightly implicit way - I would put it in hwaccel_decode_init() rather than in the option parsing code.

There was some question of whether all hwaccels (real and dummy) should behave identically here, but given the nasty default case if we do that I don't think it's justified (though feel free to argue further on this point if you feel strongly, I'm not that attached to it).

TL;DR - I preferred the mechanism of the previous version, with some changes to make it clearer what the distinction is.

Thanks,

- Mark

PS:  The libmfx behaviour unfortunately isn't really a very useful comparison, because the implementation is incomplete.  The important change with the generic code was that the string matching does the right thing to create a device for the non-hwaccel case.  The hwaccel case didn't change at all because hw_device_ctx + hardware frame output isn't supported in the decoder.  I don't have any plans to implement it soon because I regard the libmfx decoders as mostly deprecated at this point (use real hwaccel for the platform + hwmap for encode instead), but I wouldn't mind it being done by someone else if they care.