[FFmpeg-devel] Added HW H.264 and HEVC encoding for AMD GPUs based on AMF SDK
Mark Thompson
sw at jkqxz.net
Wed Nov 15 01:11:16 EET 2017
On 14/11/17 22:10, Mironov, Mikhail wrote:
>> On 14/11/17 17:14, Mironov, Mikhail wrote:
>>>>>>>>> + res = ctx->factory->pVtbl->CreateContext(ctx->factory,
>>>>>>>>> + &ctx-
>>>>>>> context);
>>>>>>>>> + AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
>>>> AVERROR_UNKNOWN,
>>>>>>>> "CreateContext() failed with error %d\n", res);
>>>>>>>>> + // try to reuse existing DX device
>>>>>>>>> + if (avctx->hw_frames_ctx) {
>>>>>>>>> + AVHWFramesContext *device_ctx =
>>>>>>>>> + (AVHWFramesContext*)avctx-
>>>>>>>>> hw_frames_ctx->data;
>>>>>>>>> + if (device_ctx->device_ctx->type ==
>>>>>> AV_HWDEVICE_TYPE_D3D11VA){
>>>>>>>>> + if (amf_av_to_amf_format(device_ctx->sw_format) ==
>>>>>>>>> + AMF_SURFACE_UNKNOWN) {
>>>>>>>>
>>>>>>>> This test is inverted.
>>>>>>>>
>>>>>>>> Have you actually tested this path? Even with that test fixed,
>>>>>>>> I'm unable to pass the following initialisation test with an AMD
>>>>>>>> D3D11
>>>> device.
>>>>>>>>
>>>>>>>
>>>>>>> Yes, the condition should be reverted. To test I had to add
>>>>>>> "-hwaccel d3d11va -hwaccel_output_format d3d11" to the command
>>>> line.
>>>>>>
>>>>>> Yeah. I get:
>>>>>>
>>>>>> $ ./ffmpeg_g -y -hwaccel d3d11va -hwaccel_device 0 -
>>>>>> hwaccel_output_format d3d11 -i ~/bbb_1080_264.mp4 -an -c:v
>> h264_amf
>>>>>> out.mp4 ...
>>>>>> [AVHWDeviceContext @ 000000000270e120] Created on device
>> 1002:665f
>>>>>> (AMD Radeon (TM) R7 360 Series).
>>>>>> ...
>>>>>> [h264_amf @ 00000000004dcd80] amf_shared: avctx->hw_frames_ctx
>>>> has
>>>>>> non-AMD device, switching to default
>>>>>>
>>>>>> It's then comedically slow in this state (about 2fps), but works
>>>>>> fine when the decode is in software.
>>>>>
>>>>> Is it possible that you also have iGPU not disabled and it is used
>>>>> for
>>>> decoding as adapter 0?
>>>>
>>>> There is an integrated GPU, but it's currently completely disabled.
>>>> (I made
>>>> <https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2017-
>>>> November/219795.html> to check that the device was definitely right.)
>>>>
>>>>> Can you provide a log from dxdiag.exe?
>>>>
>>>> <http://ixia.jkqxz.net/~mrt/DxDiag.txt>
>>>>
>>>>> If AMF created own DX device then submission logic an speed is the
>>>>> same
>>>> as from SW decoder.
>>>>> It would be interesting to see a short GPUVIEW log.
>>>>
>>>> My Windows knowledge is insufficient to get that immediately, but if
>>>> you think it's useful I can look into it?
>>>
>>> I think I know what is going on. You are on Win7. In Win7 D3D11VA API is
>> not available from MSFT.
>>> AMF will fall into DX9 based encoding submission and this is why the
>> message was produced.
>>> The AMF performance should be the same on DX9 but I don’t know how
>>> decoding is done without D3D11VA support.
>>> GPUVIEW is not really needed if my assumptions are correct.
>>
>> Ah, that would make sense. Maybe detect it and fail earlier with a helpful
>> message - the current "not an AMD device" is wrong in this case.
>>
>> Decode via D3D11 does work for me on Windows 7 with both AMD and Intel;
>> I don't know anything about how, though. (I don't really care about
>> Windows 7 - this was just a set of parts mashed together into a working
>> machine for testing, the Windows 7 install is inherited from elsewhere.)
>
> I run this in Win7. What I see is the decoding does go via D3D11VA. The support comes
> with Platform Update. But AMF encoder works on Win7 via D3D9 only. That explains
> the performance hit: In D3D11 to copy video output HW accelerator copies frame via staging texture.
> If I use for decoding DXVA2 it is faster because staging texture is not needed.
> I am thinking to connect dxva2 acceleration with AMF encoder
> but probably in the next phase.
> I've added more precise logging.
>
>>
>>>>>>>>> + { "filler_data", "Filler Data Enable",
>> OFFSET(filler_data),
>>>>>>>> AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
>>>>>>>>> + { "vbaq", "Enable VBAQ",
>>>> OFFSET(enable_vbaq),
>>>>>>>> AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
>>>>>>>>> + { "frame_skipping", "Rate Control Based Frame Skip",
>>>>>>>> OFFSET(skip_frame), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
>>>>>>>>> +
>>>>>>>>> + /// QP Values
>>>>>>>>> + { "qp_i", "Quantization Parameter for I-Frame",
>>>> OFFSET(qp_i),
>>>>>>>> AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 51, VE },
>>>>>>>>> + { "qp_p", "Quantization Parameter for P-Frame",
>>>>>> OFFSET(qp_p),
>>>>>>>> AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 51, VE },
>>>>>>>>> + { "qp_b", "Quantization Parameter for B-Frame",
>>>>>> OFFSET(qp_b),
>>>>>>>> AV_OPT_TYPE_INT, { .i64 = -1 }, -1, 51, VE },
>>>>>>>>> +
>>>>>>>>> + /// Pre-Pass, Pre-Analysis, Two-Pass
>>>>>>>>> + { "preanalysis", "Pre-Analysis Mode",
>>>>>> OFFSET(preanalysis),
>>>>>>>> AV_OPT_TYPE_BOOL,{ .i64 = 0 }, 0, 1, VE, NULL },
>>>>>>>>> +
>>>>>>>>> + /// Maximum Access Unit Size
>>>>>>>>> + { "max_au_size", "Maximum Access Unit Size for rate control
>> (in
>>>>>> bits)",
>>>>>>>> OFFSET(max_au_size), AV_OPT_TYPE_INT, { .i64 = 0 }, 0,
>> INT_MAX,
>>>> VE
>>>>>> },
>>>>>>>>
>>>>>>>> Can you explain more about what this option does? I don't seem
>>>>>>>> to be able to get it to do anything - e.g. setting -max_au_size
>>>>>>>> 80000 with 30fps CBR 1M (which should be easily achievable) still
>>>>>>>> makes packets of more than 80000
>>>>>>>> bits.)
>>>>>>>>
>>>>>>>
>>>>>>> It means maximum frame size in bits, and it should be used
>>>>>>> together with enforce_hrd enabled. I tested, it works after the
>>>>>>> related fix for
>>>>>> enforce_hrd.
>>>>>>> I added dependency handling.
>>>>>>
>>>>>> $ ./ffmpeg_g -y -nostats -i ~/bbb_1080_264.mp4 -an -c:v h264_amf
>>>>>> -bsf:v trace_headers -frames:v 1000 -enforce_hrd 1 -b:v 1M -maxrate
>>>>>> 1M - max_au_size 80000 out.mp4 2>&1 | grep 'Packet: [0-9]\{5\}'
>>>>>> [AVBSFContext @ 00000000029d7f40] Packet: 11426 bytes, key frame,
>>>>>> pts 128000, dts 128000.
>>>>>> [AVBSFContext @ 00000000029d7f40] Packet: 17623 bytes, key frame,
>>>>>> pts 192000, dts 192000.
>>>>>> [AVBSFContext @ 00000000029d7f40] Packet: 23358 bytes, pts 249856,
>>>>>> dts 249856.
>>>>>>
>>>>>> (That is, packets bigger than the supposed 80000-bit maximum.)
>>>> Expected?
>>>>>
>>>>> No, this is not expected. I tried the exact command line and did not
>>>>> get packages more then 80000 bits. Sorry to ask but did you apply
>>>>> the
>>>> change in amfenc.h?
>>>>
>>>> I used the most recent patch on the list,
>>>> <https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2017-
>>>> November/219757.html>. (Required a bit of fixup to apply, as Michael
>>>> already noted.)
>>>
>>> Yes, I will submit the update today but I cannot repro large packets.
>>> Can you just check if you get the change:
>>>
>>> - typedef amf_uint16 amf_bool;
>>> + typedef amf_uint8 amf_bool;
>>
>> Yes, I have that change.
>>
>> Could it be a difference in support for the particular card I am using (Bonaire
>> / GCN 2, so several generations old now), or will that be the same across all
>> of them?
>>
>
> I got a different clip and reproduced the issue. We discussed this with our main "rate control" guy.
> Basically, this parameter cannot guarantee the frame size in a complex scene case when it is combined
> with relatively low bit rate value and relatively low max AU size value.
> To confirm this it would be great if you could share your output stream so we verify that this is the case.
> (or input stream).
Input: <http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_60fps_normal.mp4>
Output: <http://ixia.jkqxz.net/~mrt/amf_max_au_size.mp4>
Looking at the transition on frame 976, the output quality is pretty bad, but not really bad enough to merit the failure - the macroblock QPs are only 37/38, and go higher on following frames.
- Mark
More information about the ffmpeg-devel
mailing list