[FFmpeg-devel] Added HW H.264 and HEVC encoding for AMD GPUs based on AMF SDK

Wed Nov 15 01:11:16 EET 2017

On 14/11/17 22:10, Mironov, Mikhail wrote:
>> On 14/11/17 17:14, Mironov, Mikhail wrote:
>>>>>>>>> +    res = ctx->factory->pVtbl->CreateContext(ctx->factory,
>>>>>>>>> + &ctx-
>>>>>>> context);
>>>>>>>>> +    AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
>>>> AVERROR_UNKNOWN,
>>>>>>>> "CreateContext() failed with error %d\n", res);
>>>>>>>>> +    // try to reuse existing DX device
>>>>>>>>> +    if (avctx->hw_frames_ctx) {
>>>>>>>>> +        AVHWFramesContext *device_ctx =
>>>>>>>>> + (AVHWFramesContext*)avctx-
>>>>>>>>> hw_frames_ctx->data;
>>>>>>>>> +        if (device_ctx->device_ctx->type ==
>>>>>> AV_HWDEVICE_TYPE_D3D11VA){
>>>>>>>>> +            if (amf_av_to_amf_format(device_ctx->sw_format) ==
>>>>>>>>> + AMF_SURFACE_UNKNOWN) {
>>>>>>>>
>>>>>>>> This test is inverted.
>>>>>>>>
>>>>>>>> Have you actually tested this path?  Even with that test fixed,
>>>>>>>> I'm unable to pass the following initialisation test with an AMD
>>>>>>>> D3D11
>>>> device.
>>>>>>>>
>>>>>>>
>>>>>>> Yes, the condition should be reverted. To test I had to add
>>>>>>> "-hwaccel d3d11va -hwaccel_output_format d3d11" to the command
>>>> line.
>>>>>>
>>>>>> Yeah.  I get:
>>>>>>
>>>>>> $ ./ffmpeg_g -y -hwaccel d3d11va -hwaccel_device 0 -
>>>>>> hwaccel_output_format d3d11 -i ~/bbb_1080_264.mp4 -an -c:v
>> h264_amf
>>>>>> out.mp4 ...
>>>>>> [AVHWDeviceContext @ 000000000270e120] Created on device
>> 1002:665f
>>>>>> (AMD Radeon (TM) R7 360 Series).
>>>>>> ...
>>>>>> [h264_amf @ 00000000004dcd80] amf_shared: avctx->hw_frames_ctx
>>>> has
>>>>>> non-AMD device, switching to default
>>>>>>
>>>>>> It's then comedically slow in this state (about 2fps), but works
>>>>>> fine when the decode is in software.
>>>>>
>>>>> Is it possible that you also have iGPU not disabled and it is used
>>>>> for
>>>> decoding as adapter 0?
>>>>
>>>> There is an integrated GPU, but it's currently completely disabled.
>>>> (I made
>>>> <https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2017-
>>>> November/219795.html> to check that the device was definitely right.)
>>>>
>>>>> Can you provide a log from dxdiag.exe?
>>>>
>>>> <http://ixia.jkqxz.net/~mrt/DxDiag.txt>
>>>>
>>>>> If AMF created own DX device then submission logic an speed is the
>>>>> same
>>>> as from SW decoder.
>>>>> It would be interesting to see a short GPUVIEW log.
>>>>
>>>> My Windows knowledge is insufficient to get that immediately, but if
>>>> you think it's useful I can look into it?
>>>
>>> I think I know what is going on. You are on Win7. In Win7 D3D11VA API is
>> not available from MSFT.
>>> AMF will fall into DX9 based encoding submission and this is why the
>> message was produced.
>>> The AMF performance should be the same on DX9 but I don’t know how
>>> decoding is done without D3D11VA support.
>>> GPUVIEW is not really needed if my assumptions are correct.
>>
>> Ah, that would make sense.  Maybe detect it and fail earlier with a helpful
>> message - the current "not an AMD device" is wrong in this case.
>>
>> Decode via D3D11 does work for me on Windows 7 with both AMD and Intel;
>> I don't know anything about how, though.  (I don't really care about
>> Windows 7 - this was just a set of parts mashed together into a working
>> machine for testing, the Windows 7 install is inherited from elsewhere.)
> 
> I run this in Win7.  What I see is the decoding does go via D3D11VA. The support comes 
> with Platform Update. But AMF encoder works on Win7 via D3D9 only. That explains 
> the performance hit: In D3D11 to copy video output HW accelerator copies frame via staging texture. 
> If I use for decoding DXVA2 it is faster because staging texture is not needed.
> I am thinking to connect dxva2 acceleration with AMF encoder 
> but probably in the next phase.
> I've added more precise logging.
> 
>>
>>>>>>>>> +    { "filler_data",    "Filler Data Enable",
>> OFFSET(filler_data),
>>>>>>>> AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
>>>>>>>>> +    { "vbaq",           "Enable VBAQ",
>>>> OFFSET(enable_vbaq),
>>>>>>>> AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
>>>>>>>>> +    { "frame_skipping", "Rate Control Based Frame Skip",
>>>>>>>> OFFSET(skip_frame),         AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
>>>>>>>>> +
>>>>>>>>> +    /// QP Values
>>>>>>>>> +    { "qp_i",           "Quantization Parameter for I-Frame",
>>>> OFFSET(qp_i),
>>>>>>>> AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 51, VE },
>>>>>>>>> +    { "qp_p",           "Quantization Parameter for P-Frame",
>>>>>> OFFSET(qp_p),
>>>>>>>> AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 51, VE },
>>>>>>>>> +    { "qp_b",           "Quantization Parameter for B-Frame",
>>>>>> OFFSET(qp_b),
>>>>>>>> AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 51, VE },
>>>>>>>>> +
>>>>>>>>> +    /// Pre-Pass, Pre-Analysis, Two-Pass
>>>>>>>>> +    { "preanalysis",    "Pre-Analysis Mode",
>>>>>> OFFSET(preanalysis),
>>>>>>>> AV_OPT_TYPE_BOOL,{ .i64 = 0 }, 0, 1, VE, NULL },
>>>>>>>>> +
>>>>>>>>> +    /// Maximum Access Unit Size
>>>>>>>>> +    { "max_au_size",    "Maximum Access Unit Size for rate control
>> (in
>>>>>> bits)",
>>>>>>>> OFFSET(max_au_size),        AV_OPT_TYPE_INT, { .i64 = 0 }, 0,
>> INT_MAX,
>>>> VE
>>>>>> },
>>>>>>>>
>>>>>>>> Can you explain more about what this option does?  I don't seem
>>>>>>>> to be able to get it to do anything - e.g. setting -max_au_size
>>>>>>>> 80000 with 30fps CBR 1M (which should be easily achievable) still
>>>>>>>> makes packets of more than 80000
>>>>>>>> bits.)
>>>>>>>>
>>>>>>>
>>>>>>> It means maximum frame size in bits, and it should be used
>>>>>>> together with enforce_hrd enabled.  I tested, it works after the
>>>>>>> related fix for
>>>>>> enforce_hrd.
>>>>>>> I added  dependency handling.
>>>>>>
>>>>>> $ ./ffmpeg_g -y -nostats -i ~/bbb_1080_264.mp4 -an -c:v h264_amf
>>>>>> -bsf:v trace_headers -frames:v 1000 -enforce_hrd 1 -b:v 1M -maxrate
>>>>>> 1M - max_au_size 80000 out.mp4 2>&1 | grep 'Packet: [0-9]\{5\}'
>>>>>> [AVBSFContext @ 00000000029d7f40] Packet: 11426 bytes, key frame,
>>>>>> pts 128000, dts 128000.
>>>>>> [AVBSFContext @ 00000000029d7f40] Packet: 17623 bytes, key frame,
>>>>>> pts 192000, dts 192000.
>>>>>> [AVBSFContext @ 00000000029d7f40] Packet: 23358 bytes, pts 249856,
>>>>>> dts 249856.
>>>>>>
>>>>>> (That is, packets bigger than the supposed 80000-bit maximum.)
>>>> Expected?
>>>>>
>>>>> No, this is not expected. I tried the exact command line and did not
>>>>> get packages more then 80000 bits. Sorry to ask but did you apply
>>>>> the
>>>> change in amfenc.h?
>>>>
>>>> I used the most recent patch on the list,
>>>> <https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2017-
>>>> November/219757.html>.  (Required a bit of fixup to apply, as Michael
>>>> already noted.)
>>>
>>> Yes, I will submit the update today but I cannot repro large packets.
>>> Can you just check if you get the change:
>>>
>>> - typedef     amf_uint16          amf_bool;
>>> + typedef     amf_uint8          amf_bool;
>>
>> Yes, I have that change.
>>
>> Could it be a difference in support for the particular card I am using (Bonaire
>> / GCN 2, so several generations old now), or will that be the same across all
>> of them?
>>
> 
> I got a different clip and reproduced the issue. We discussed this with our main "rate control" guy. 
> Basically, this parameter cannot guarantee the frame size in a complex scene case when it is combined 
> with relatively low bit rate value  and relatively low max AU size value. 
> To confirm this it would be great if you could share your output stream so we verify that this is the case.
> (or input stream).

Input:  <http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_60fps_normal.mp4>
Output: <http://ixia.jkqxz.net/~mrt/amf_max_au_size.mp4>

Looking at the transition on frame 976, the output quality is pretty bad, but not really bad enough to merit the failure - the macroblock QPs are only 37/38, and go higher on following frames.

- Mark