[FFmpeg-devel] Added - HW accelerated H.264 and HEVC encoding for AMD GPUs based on AMF SDK

Mark Thompson sw at jkqxz.net
Sun Oct 29 23:51:07 EET 2017


On 29/10/17 20:48, Mironov, Mikhail wrote:
>> -----Original Message-----
>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
>> Of Mark Thompson
>> Sent: October 29, 2017 3:36 PM
>> To: ffmpeg-devel at ffmpeg.org
>> Subject: Re: [FFmpeg-devel] Added - HW accelerated H.264 and HEVC
>> encoding for AMD GPUs based on AMF SDK
>>
>> On 29/10/17 18:39, Mironov, Mikhail wrote:
>>>>
>>>>> +    { AV_PIX_FMT_D3D11,      AMF_SURFACE_NV12 },
>>>>
>>>> D3D11 surfaces need not be NV12.  The actual format is in
>>>> AVHWFramesContext.sw_format - if you only support 8-bit then
>>>> something nasty probably happens if you give it P010 surfaces.
>>>>
>>>
>>> Agreed, but how should I define D3D11 NV12 as input format if I only have
>> AV_PIX_FMT_D3D11?
>>
>> Check sw_format afterwards.
>>
> 
> sw_format is part of AVHWFramesContext in hwcodec.h 
> But how should I define pix_fmt array in AVCodec? For example, In nvenc.c 
> is done via AV_PIX_FMT_CUDA. Is it wrong?

NVENC checks sw_format at init time - see <http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/nvenc.c;h=e1d3316de39cfefa41432105aed91f0d4e93a154;hb=HEAD#l1408>.  I think you want to do the same thing.

(I agree this result isn't ideal with just the pix_fmt array, but there isn't currently a way to check sw_format as well from the outside.  Checking it at init time and failing if it isn't usable is the best we can do.)

>>>>> +
>>>>> +    // try to reuse existing DX device
>>>>> +
>>>>> +    if (avctx->hw_frames_ctx) {
>>>>> +        AVHWFramesContext *device_ctx = (AVHWFramesContext*)avctx-
>>>>> hw_frames_ctx->data;
>>>>> +        if (device_ctx->device_ctx->type ==
>> AV_HWDEVICE_TYPE_D3D11VA){
>>>>> +            if (device_ctx->device_ctx->hwctx) {
>>>>> +                AVD3D11VADeviceContext *device_d3d11 =
>>>> (AVD3D11VADeviceContext *)device_ctx->device_ctx->hwctx;
>>>>> +                res = ctx->context->pVtbl->InitDX11(ctx->context,
>>>>> + device_d3d11-
>>>>> device, AMF_DX11_1);
>>>>> +                if (res == AMF_OK) {
>>>>> +                    ctx->hw_frames_ctx = av_buffer_ref(avctx->hw_frames_ctx);
>>>>> +                } else {
>>>>> +                    av_log(avctx, AV_LOG_INFO, "amf_shared: avctx-
>>>>> hw_frames_ctx has non-AMD device, switching to default");
>>>>
>>>> I'm not sure this is going to act sensibly - if the user has D3D11
>>>> frames input on another device, does it work?
>>>>
>>>
>>> If device is not AMD's the code is trying to create another device  - the
>> compatible one .
>>> In these cases the submission will be via system memory.
>>
>> And that works with D3D11 frames as hw_frames_ctx on another device?
> 
> Why not? AMF will be initialized with a different device, 
> in submission code it is detected, surface with system (host) memory is allocated,  
> data will be copied using av_image_copy() and host memory will be submitted to AMF.

It won't be copied externally if it's already in a D3D11 surface for the other device.  At the moment it looks like it silently encodes an empty frame because the conditions in ff_amf_encode_frame() make it call amf_copy_surface(), but av_image_copy() doesn't support copying from hardware surfaces and has no way to indicate that failure.

I think the easiest way to solve this is to fail at init time if the given hw_frames_ctx is on an unusable device.  The hw_device_ctx case matters less, because the user input is not going to be in D3D11 frames in that case.

>>>>> +            } else {
>>>>> +                surface->pVtbl->Release(surface);
>>>>> +                surface = NULL;
>>>>> +                AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
>>>> AVERROR_UNKNOWN, "SubmitInput() failed with error %d", res);
>>>>> +                submitted = 1;
>>>>> +            }
>>>>> +        }
>>>>> +        // poll results
>>>>> +        if (!data) {
>>>>> +            res = ctx->encoder->pVtbl->QueryOutput(ctx->encoder, &data);
>>>>> +            if (data) {
>>>>> +                AMFBuffer* buffer;
>>>>> +                AMFGuid guid = IID_AMFBuffer();
>>>>> +                data->pVtbl->QueryInterface(data, &guid,
>>>>> + (void**)&buffer); //
>>>> query for buffer interface
>>>>> +                ret = amf_copy_buffer(avctx, pkt, buffer);
>>>>> +                if (!ret)
>>>>> +                    *got_packet = 1;
>>>>> +                buffer->pVtbl->Release(buffer);
>>>>> +                data->pVtbl->Release(data);
>>>>> +                if (ctx->eof) {
>>>>> +                    submitted = true; // we are in the drain state - no
>> submissions
>>>>> +                }
>>>>> +            } else if (res == AMF_EOF) {
>>>>> +                submitted = true; // drain complete
>>>>> +            } else {
>>>>> +                if (!submitted) {
>>>>> +                    av_usleep(1000); // wait and poll again
>>>>> +                }
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>
>>>> I suspect this setup is not actually going to follow the constraints
>>>> of the deprecated encode2().  Given the API here, I think you would
>>>> be much better off writing with send_frame()/receive_packet().
>>>
>>> I considered this, but without a thread that would call
>>> receive_packet() the implementation will fall into the same pattern as it is
>> now but with an additional queue of ready outputs.
>>
>> See the documentation - you can return EAGAIN to send_packet() when a
>> frame is available, no external queue is required.
> 
> I didn’t debug it but from visual code inspection this logic is available for decoders only. 
> For encoder call avcodec_send_frame() inside do_video_out() doesn’t check for EAGAIN 
> and inside avcodec_send_frame() there is no such checking.

Right, sorry, I mean return EAGAIN to receive_packet().

This does need to have some way to tell whether the SubmitInput() call will return AMF_INPUT_FULL.  If you have that, then you can block in receive_packet() iff that is true - if not, just return EAGAIN and get another frame.  This also maximises the number of frames in-flight for the asynchronous encode.

>>>>> +    // Bitrate
>>>>> +    AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder,
>>>> AMF_VIDEO_ENCODER_TARGET_BITRATE, avctx->bit_rate);
>>>>> +    if (ctx->rate_control_mode ==
>>>> AMF_VIDEO_ENCODER_RATE_CONTROL_METHOD_CBR) {
>>>>> +        AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder,
>>>> AMF_VIDEO_ENCODER_PEAK_BITRATE, avctx->bit_rate);
>>>>> +    } else {
>>>>> +        int rc_max_rate = avctx->rc_max_rate >= avctx->bit_rate ?
>>>>> + avctx-
>>>>> rc_max_rate : avctx->bit_rate * 13 / 10;
>>>>
>>>> Where does 13/10 come from?
>>>>
>>>
>>> For best results rc_max_rate should be bigger then bit_rate  for VBR. For
>> CBR it is ignored.
>>> What is the better way to correct an unset parameter?
>>
>> Max rate should only be constrained by the HRD buffering in this case.  Are
>> you sure this isn't handled internally if you don't supply the paramater at all?
>> If not, maybe supply some sort of infinity to avoid it constraining anything
>> appropriately.
> 
> I concern about VBR with peaks. For this mode max rate defines height of the peaks. 
> If not defined the encoder will put something valid, but idea was to control such thing explicitly 
> from FFmpeg.

If it is set by the user then certainly use that value, but if not then it's probably better to make up the answer inside the driver rather than here (the driver will have better knowledge of how the hardware works, after all).

>> Some more random questions:
>>
>> * How can we supply colour information to the codecs?  (VUI
>> colour_primaries/transfer_characteristics/matrix_coefficients/chroma_samp
>> le_loc_type.)
> 
> There is very limited set of VUI parameters available: timing (shared with rate control via frame rate), 
> aspect ratio, and video_full_range_flag, bit stream restriction and few other related to reordering .

Inability to set the colour information will do nasty things to the output video in some cases.  Input coming from JPEG or similar with centred chroma samples is visibly weird viewed with the default middle-left, and HDR colours will come out incorrectly.  I recommend you add a way to do this, though it won't block the rest of the patch.

Thanks,

- Mark


More information about the ffmpeg-devel mailing list