[FFmpeg-devel] Added - HW accelerated H.264 and HEVC encoding for AMD GPUs based on AMF SDK

Mon Oct 30 00:57:31 EET 2017

> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
> Of Mark Thompson
> Sent: October 29, 2017 5:51 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] Added - HW accelerated H.264 and HEVC
> encoding for AMD GPUs based on AMF SDK
> 
> On 29/10/17 20:48, Mironov, Mikhail wrote:
> >> -----Original Message-----
> >> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> Behalf
> >> Of Mark Thompson
> >> Sent: October 29, 2017 3:36 PM
> >> To: ffmpeg-devel at ffmpeg.org
> >> Subject: Re: [FFmpeg-devel] Added - HW accelerated H.264 and HEVC
> >> encoding for AMD GPUs based on AMF SDK
> >>
> >> On 29/10/17 18:39, Mironov, Mikhail wrote:
> >>>>
> >>>>> +    { AV_PIX_FMT_D3D11,      AMF_SURFACE_NV12 },
> >>>>
> >>>> D3D11 surfaces need not be NV12.  The actual format is in
> >>>> AVHWFramesContext.sw_format - if you only support 8-bit then
> >>>> something nasty probably happens if you give it P010 surfaces.
> >>>>
> >>>
> >>> Agreed, but how should I define D3D11 NV12 as input format if I only
> >>> have
> >> AV_PIX_FMT_D3D11?
> >>
> >> Check sw_format afterwards.
> >>
> >
> > sw_format is part of AVHWFramesContext in hwcodec.h But how should I
> > define pix_fmt array in AVCodec? For example, In nvenc.c is done via
> > AV_PIX_FMT_CUDA. Is it wrong?
> 
> NVENC checks sw_format at init time - see
> <http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/nvenc.c;h=e1d33
> 16de39cfefa41432105aed91f0d4e93a154;hb=HEAD#l1408>.  I think you want
> to do the same thing.
> 
> (I agree this result isn't ideal with just the pix_fmt array, but there isn't
> currently a way to check sw_format as well from the outside.  Checking it at
> init time and failing if it isn't usable is the best we can do.)
> 

Sure, will add checking at initialization.

> >>>>> +
> >>>>> +    // try to reuse existing DX device
> >>>>> +
> >>>>> +    if (avctx->hw_frames_ctx) {
> >>>>> +        AVHWFramesContext *device_ctx =
> >>>>> + (AVHWFramesContext*)avctx-
> >>>>> hw_frames_ctx->data;
> >>>>> +        if (device_ctx->device_ctx->type ==
> >> AV_HWDEVICE_TYPE_D3D11VA){
> >>>>> +            if (device_ctx->device_ctx->hwctx) {
> >>>>> +                AVD3D11VADeviceContext *device_d3d11 =
> >>>> (AVD3D11VADeviceContext *)device_ctx->device_ctx->hwctx;
> >>>>> +                res = ctx->context->pVtbl->InitDX11(ctx->context,
> >>>>> + device_d3d11-
> >>>>> device, AMF_DX11_1);
> >>>>> +                if (res == AMF_OK) {
> >>>>> +                    ctx->hw_frames_ctx = av_buffer_ref(avctx-
> >hw_frames_ctx);
> >>>>> +                } else {
> >>>>> +                    av_log(avctx, AV_LOG_INFO, "amf_shared:
> >>>>> + avctx-
> >>>>> hw_frames_ctx has non-AMD device, switching to default");
> >>>>
> >>>> I'm not sure this is going to act sensibly - if the user has D3D11
> >>>> frames input on another device, does it work?
> >>>>
> >>>
> >>> If device is not AMD's the code is trying to create another device
> >>> - the
> >> compatible one .
> >>> In these cases the submission will be via system memory.
> >>
> >> And that works with D3D11 frames as hw_frames_ctx on another device?
> >
> > Why not? AMF will be initialized with a different device, in
> > submission code it is detected, surface with system (host) memory is
> > allocated, data will be copied using av_image_copy() and host memory will
> be submitted to AMF.
> 
> It won't be copied externally if it's already in a D3D11 surface for the other
> device.  At the moment it looks like it silently encodes an empty frame
> because the conditions in ff_amf_encode_frame() make it call
> amf_copy_surface(), but av_image_copy() doesn't support copying from
> hardware surfaces and has no way to indicate that failure.
> 
> I think the easiest way to solve this is to fail at init time if the given
> hw_frames_ctx is on an unusable device.  The hw_device_ctx case matters
> less, because the user input is not going to be in D3D11 frames in that case.

I though that in multi-GPU situation it is still good to support encoding. 
I could force HW frame to be copied into system memory and consume it in encoder:
 transfer_data_to(). Should it be better then fail at initialization?

> 
> >>>>> +            } else {
> >>>>> +                surface->pVtbl->Release(surface);
> >>>>> +                surface = NULL;
> >>>>> +                AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
> >>>> AVERROR_UNKNOWN, "SubmitInput() failed with error %d", res);
> >>>>> +                submitted = 1;
> >>>>> +            }
> >>>>> +        }
> >>>>> +        // poll results
> >>>>> +        if (!data) {
> >>>>> +            res = ctx->encoder->pVtbl->QueryOutput(ctx->encoder,
> &data);
> >>>>> +            if (data) {
> >>>>> +                AMFBuffer* buffer;
> >>>>> +                AMFGuid guid = IID_AMFBuffer();
> >>>>> +                data->pVtbl->QueryInterface(data, &guid,
> >>>>> + (void**)&buffer); //
> >>>> query for buffer interface
> >>>>> +                ret = amf_copy_buffer(avctx, pkt, buffer);
> >>>>> +                if (!ret)
> >>>>> +                    *got_packet = 1;
> >>>>> +                buffer->pVtbl->Release(buffer);
> >>>>> +                data->pVtbl->Release(data);
> >>>>> +                if (ctx->eof) {
> >>>>> +                    submitted = true; // we are in the drain
> >>>>> + state - no
> >> submissions
> >>>>> +                }
> >>>>> +            } else if (res == AMF_EOF) {
> >>>>> +                submitted = true; // drain complete
> >>>>> +            } else {
> >>>>> +                if (!submitted) {
> >>>>> +                    av_usleep(1000); // wait and poll again
> >>>>> +                }
> >>>>> +            }
> >>>>> +        }
> >>>>> +    }
> >>>>
> >>>> I suspect this setup is not actually going to follow the
> >>>> constraints of the deprecated encode2().  Given the API here, I
> >>>> think you would be much better off writing with
> send_frame()/receive_packet().
> >>>
> >>> I considered this, but without a thread that would call
> >>> receive_packet() the implementation will fall into the same pattern
> >>> as it is
> >> now but with an additional queue of ready outputs.
> >>
> >> See the documentation - you can return EAGAIN to send_packet() when a
> >> frame is available, no external queue is required.
> >
> > I didn’t debug it but from visual code inspection this logic is available for
> decoders only.
> > For encoder call avcodec_send_frame() inside do_video_out() doesn’t
> > check for EAGAIN and inside avcodec_send_frame() there is no such
> checking.
> 
> Right, sorry, I mean return EAGAIN to receive_packet().
> 
> This does need to have some way to tell whether the SubmitInput() call will
> return AMF_INPUT_FULL.  If you have that, then you can block in
> receive_packet() iff that is true - if not, just return EAGAIN and get another
> frame.  This also maximises the number of frames in-flight for the
> asynchronous encode.

If inside send_packet() SubmitInput() returns AMF_INPUT_FULL there is no way I can trick 
ffmpeg.exe to resubmit this frame. If I bock it, then receive_packet() will not be called.
It is impossible to tell ahead of time that SubmitInput() will returns AMF_INPUT_FULL 
because it is HW - specific. In MHO the encoder submission should be organized the 
same way as  decoder submission. Till then I don’t see how to implement this. Keep in mind that AMF API 
is designed exactly the same way as send_frame() / receive_packet() pair of functions.

> 
> >>>>> +    // Bitrate
> >>>>> +    AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder,
> >>>> AMF_VIDEO_ENCODER_TARGET_BITRATE, avctx->bit_rate);
> >>>>> +    if (ctx->rate_control_mode ==
> >>>> AMF_VIDEO_ENCODER_RATE_CONTROL_METHOD_CBR) {
> >>>>> +        AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder,
> >>>> AMF_VIDEO_ENCODER_PEAK_BITRATE, avctx->bit_rate);
> >>>>> +    } else {
> >>>>> +        int rc_max_rate = avctx->rc_max_rate >= avctx->bit_rate ?
> >>>>> + avctx-
> >>>>> rc_max_rate : avctx->bit_rate * 13 / 10;
> >>>>
> >>>> Where does 13/10 come from?
> >>>>
> >>>
> >>> For best results rc_max_rate should be bigger then bit_rate  for
> >>> VBR. For
> >> CBR it is ignored.
> >>> What is the better way to correct an unset parameter?
> >>
> >> Max rate should only be constrained by the HRD buffering in this
> >> case.  Are you sure this isn't handled internally if you don't supply the
> paramater at all?
> >> If not, maybe supply some sort of infinity to avoid it constraining
> >> anything appropriately.
> >
> > I concern about VBR with peaks. For this mode max rate defines height of
> the peaks.
> > If not defined the encoder will put something valid, but idea was to
> > control such thing explicitly from FFmpeg.
> 
> If it is set by the user then certainly use that value, but if not then it's
> probably better to make up the answer inside the driver rather than here
> (the driver will have better knowledge of how the hardware works, after all).
> 
> >> Some more random questions:
> >>
> >> * How can we supply colour information to the codecs?  (VUI
> >> colour_primaries/transfer_characteristics/matrix_coefficients/chroma_
> >> samp
> >> le_loc_type.)
> >
> > There is very limited set of VUI parameters available: timing (shared
> > with rate control via frame rate), aspect ratio, and video_full_range_flag,
> bit stream restriction and few other related to reordering .
> 
> Inability to set the colour information will do nasty things to the output video
> in some cases.  Input coming from JPEG or similar with centred chroma
> samples is visibly weird viewed with the default middle-left, and HDR colours
> will come out incorrectly.  I recommend you add a way to do this, though it
> won't block the rest of the patch.

VUI is generating deep in the driver or in firmware, you are asking for a driver feature. I will 
definitely bring this to the driver team and ensure that they implement this at one point.
Such inputs are truly appreciated. 

> 
> Thanks,
> 
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Thanks Mikhail