[FFmpeg-devel] Added - HW accelerated H.264 and HEVC encoding for AMD GPUs based on AMF SDK
Mironov, Mikhail
Mikhail.Mironov at amd.com
Mon Oct 30 00:57:31 EET 2017
> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
> Of Mark Thompson
> Sent: October 29, 2017 5:51 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] Added - HW accelerated H.264 and HEVC
> encoding for AMD GPUs based on AMF SDK
>
> On 29/10/17 20:48, Mironov, Mikhail wrote:
> >> -----Original Message-----
> >> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> Behalf
> >> Of Mark Thompson
> >> Sent: October 29, 2017 3:36 PM
> >> To: ffmpeg-devel at ffmpeg.org
> >> Subject: Re: [FFmpeg-devel] Added - HW accelerated H.264 and HEVC
> >> encoding for AMD GPUs based on AMF SDK
> >>
> >> On 29/10/17 18:39, Mironov, Mikhail wrote:
> >>>>
> >>>>> + { AV_PIX_FMT_D3D11, AMF_SURFACE_NV12 },
> >>>>
> >>>> D3D11 surfaces need not be NV12. The actual format is in
> >>>> AVHWFramesContext.sw_format - if you only support 8-bit then
> >>>> something nasty probably happens if you give it P010 surfaces.
> >>>>
> >>>
> >>> Agreed, but how should I define D3D11 NV12 as input format if I only
> >>> have
> >> AV_PIX_FMT_D3D11?
> >>
> >> Check sw_format afterwards.
> >>
> >
> > sw_format is part of AVHWFramesContext in hwcodec.h But how should I
> > define pix_fmt array in AVCodec? For example, In nvenc.c is done via
> > AV_PIX_FMT_CUDA. Is it wrong?
>
> NVENC checks sw_format at init time - see
> <http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/nvenc.c;h=e1d33
> 16de39cfefa41432105aed91f0d4e93a154;hb=HEAD#l1408>. I think you want
> to do the same thing.
>
> (I agree this result isn't ideal with just the pix_fmt array, but there isn't
> currently a way to check sw_format as well from the outside. Checking it at
> init time and failing if it isn't usable is the best we can do.)
>
Sure, will add checking at initialization.
> >>>>> +
> >>>>> + // try to reuse existing DX device
> >>>>> +
> >>>>> + if (avctx->hw_frames_ctx) {
> >>>>> + AVHWFramesContext *device_ctx =
> >>>>> + (AVHWFramesContext*)avctx-
> >>>>> hw_frames_ctx->data;
> >>>>> + if (device_ctx->device_ctx->type ==
> >> AV_HWDEVICE_TYPE_D3D11VA){
> >>>>> + if (device_ctx->device_ctx->hwctx) {
> >>>>> + AVD3D11VADeviceContext *device_d3d11 =
> >>>> (AVD3D11VADeviceContext *)device_ctx->device_ctx->hwctx;
> >>>>> + res = ctx->context->pVtbl->InitDX11(ctx->context,
> >>>>> + device_d3d11-
> >>>>> device, AMF_DX11_1);
> >>>>> + if (res == AMF_OK) {
> >>>>> + ctx->hw_frames_ctx = av_buffer_ref(avctx-
> >hw_frames_ctx);
> >>>>> + } else {
> >>>>> + av_log(avctx, AV_LOG_INFO, "amf_shared:
> >>>>> + avctx-
> >>>>> hw_frames_ctx has non-AMD device, switching to default");
> >>>>
> >>>> I'm not sure this is going to act sensibly - if the user has D3D11
> >>>> frames input on another device, does it work?
> >>>>
> >>>
> >>> If device is not AMD's the code is trying to create another device
> >>> - the
> >> compatible one .
> >>> In these cases the submission will be via system memory.
> >>
> >> And that works with D3D11 frames as hw_frames_ctx on another device?
> >
> > Why not? AMF will be initialized with a different device, in
> > submission code it is detected, surface with system (host) memory is
> > allocated, data will be copied using av_image_copy() and host memory will
> be submitted to AMF.
>
> It won't be copied externally if it's already in a D3D11 surface for the other
> device. At the moment it looks like it silently encodes an empty frame
> because the conditions in ff_amf_encode_frame() make it call
> amf_copy_surface(), but av_image_copy() doesn't support copying from
> hardware surfaces and has no way to indicate that failure.
>
> I think the easiest way to solve this is to fail at init time if the given
> hw_frames_ctx is on an unusable device. The hw_device_ctx case matters
> less, because the user input is not going to be in D3D11 frames in that case.
I though that in multi-GPU situation it is still good to support encoding.
I could force HW frame to be copied into system memory and consume it in encoder:
transfer_data_to(). Should it be better then fail at initialization?
>
> >>>>> + } else {
> >>>>> + surface->pVtbl->Release(surface);
> >>>>> + surface = NULL;
> >>>>> + AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
> >>>> AVERROR_UNKNOWN, "SubmitInput() failed with error %d", res);
> >>>>> + submitted = 1;
> >>>>> + }
> >>>>> + }
> >>>>> + // poll results
> >>>>> + if (!data) {
> >>>>> + res = ctx->encoder->pVtbl->QueryOutput(ctx->encoder,
> &data);
> >>>>> + if (data) {
> >>>>> + AMFBuffer* buffer;
> >>>>> + AMFGuid guid = IID_AMFBuffer();
> >>>>> + data->pVtbl->QueryInterface(data, &guid,
> >>>>> + (void**)&buffer); //
> >>>> query for buffer interface
> >>>>> + ret = amf_copy_buffer(avctx, pkt, buffer);
> >>>>> + if (!ret)
> >>>>> + *got_packet = 1;
> >>>>> + buffer->pVtbl->Release(buffer);
> >>>>> + data->pVtbl->Release(data);
> >>>>> + if (ctx->eof) {
> >>>>> + submitted = true; // we are in the drain
> >>>>> + state - no
> >> submissions
> >>>>> + }
> >>>>> + } else if (res == AMF_EOF) {
> >>>>> + submitted = true; // drain complete
> >>>>> + } else {
> >>>>> + if (!submitted) {
> >>>>> + av_usleep(1000); // wait and poll again
> >>>>> + }
> >>>>> + }
> >>>>> + }
> >>>>> + }
> >>>>
> >>>> I suspect this setup is not actually going to follow the
> >>>> constraints of the deprecated encode2(). Given the API here, I
> >>>> think you would be much better off writing with
> send_frame()/receive_packet().
> >>>
> >>> I considered this, but without a thread that would call
> >>> receive_packet() the implementation will fall into the same pattern
> >>> as it is
> >> now but with an additional queue of ready outputs.
> >>
> >> See the documentation - you can return EAGAIN to send_packet() when a
> >> frame is available, no external queue is required.
> >
> > I didn’t debug it but from visual code inspection this logic is available for
> decoders only.
> > For encoder call avcodec_send_frame() inside do_video_out() doesn’t
> > check for EAGAIN and inside avcodec_send_frame() there is no such
> checking.
>
> Right, sorry, I mean return EAGAIN to receive_packet().
>
> This does need to have some way to tell whether the SubmitInput() call will
> return AMF_INPUT_FULL. If you have that, then you can block in
> receive_packet() iff that is true - if not, just return EAGAIN and get another
> frame. This also maximises the number of frames in-flight for the
> asynchronous encode.
If inside send_packet() SubmitInput() returns AMF_INPUT_FULL there is no way I can trick
ffmpeg.exe to resubmit this frame. If I bock it, then receive_packet() will not be called.
It is impossible to tell ahead of time that SubmitInput() will returns AMF_INPUT_FULL
because it is HW - specific. In MHO the encoder submission should be organized the
same way as decoder submission. Till then I don’t see how to implement this. Keep in mind that AMF API
is designed exactly the same way as send_frame() / receive_packet() pair of functions.
>
> >>>>> + // Bitrate
> >>>>> + AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder,
> >>>> AMF_VIDEO_ENCODER_TARGET_BITRATE, avctx->bit_rate);
> >>>>> + if (ctx->rate_control_mode ==
> >>>> AMF_VIDEO_ENCODER_RATE_CONTROL_METHOD_CBR) {
> >>>>> + AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder,
> >>>> AMF_VIDEO_ENCODER_PEAK_BITRATE, avctx->bit_rate);
> >>>>> + } else {
> >>>>> + int rc_max_rate = avctx->rc_max_rate >= avctx->bit_rate ?
> >>>>> + avctx-
> >>>>> rc_max_rate : avctx->bit_rate * 13 / 10;
> >>>>
> >>>> Where does 13/10 come from?
> >>>>
> >>>
> >>> For best results rc_max_rate should be bigger then bit_rate for
> >>> VBR. For
> >> CBR it is ignored.
> >>> What is the better way to correct an unset parameter?
> >>
> >> Max rate should only be constrained by the HRD buffering in this
> >> case. Are you sure this isn't handled internally if you don't supply the
> paramater at all?
> >> If not, maybe supply some sort of infinity to avoid it constraining
> >> anything appropriately.
> >
> > I concern about VBR with peaks. For this mode max rate defines height of
> the peaks.
> > If not defined the encoder will put something valid, but idea was to
> > control such thing explicitly from FFmpeg.
>
> If it is set by the user then certainly use that value, but if not then it's
> probably better to make up the answer inside the driver rather than here
> (the driver will have better knowledge of how the hardware works, after all).
>
> >> Some more random questions:
> >>
> >> * How can we supply colour information to the codecs? (VUI
> >> colour_primaries/transfer_characteristics/matrix_coefficients/chroma_
> >> samp
> >> le_loc_type.)
> >
> > There is very limited set of VUI parameters available: timing (shared
> > with rate control via frame rate), aspect ratio, and video_full_range_flag,
> bit stream restriction and few other related to reordering .
>
> Inability to set the colour information will do nasty things to the output video
> in some cases. Input coming from JPEG or similar with centred chroma
> samples is visibly weird viewed with the default middle-left, and HDR colours
> will come out incorrectly. I recommend you add a way to do this, though it
> won't block the rest of the patch.
VUI is generating deep in the driver or in firmware, you are asking for a driver feature. I will
definitely bring this to the driver team and ensure that they implement this at one point.
Such inputs are truly appreciated.
>
> Thanks,
>
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Thanks Mikhail
More information about the ffmpeg-devel
mailing list