[FFmpeg-devel] [PATCH v3 1/2] dxva: wait until D3D11 buffer copies are done before submitting them

Soft Works softworkz at hotmail.com
Sat Aug 8 00:59:27 EEST 2020


> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> Steve Lhomme
> Sent: Friday, August 7, 2020 3:05 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v3 1/2] dxva: wait until D3D11 buffer
> copies are done before submitting them
> 
> I experimented a bit more with this. Here are the 3 scenarii in other of least
> frame late:
> 
> - GetData waiting for 1/2s and releasing the lock
> - No use of GetData (current code)
> - GetData waiting for 1/2s and keeping the lock
> 
> The last option has horrible perfomance issues and should not be used.
> 
> The first option gives about 50% less late frames compared to the current
> code. *But* it requires to unlock the Video Context. There are 2 problems
> with this:
> 
> - the same ID3D11Asynchronous is used to wait on multiple concurrent
> thread. This can confuse D3D11 which emits a warning in the logs.
> - another thread might Get/Release some buffers and submit them before
> this thread is finished processing. That can result in distortions, for example if
> the second thread/frame depends on the first thread/frame which is not
> submitted yet.
> 
> The former issue can be solved by using a ID3D11Asynchronous per thread.
> That requires some TLS storage which FFmpeg doesn't seem to support yet.
> With this I get virtually no frame late.
> 
> The latter issue only occur if the wait is too long. For example waiting by
> increments of 10ms is too long in my test. Using increments of 1ms or 2ms
> works fine in the most stressing sample I have (Sony Camping HDR HEVC high
> bitrate). But this seems hackish. There's still potentially a quick frame (alt
> frame in VPx/AV1 for example) that might get through to the decoder too
> early. (I suppose that's the source of the distortions I
> see)
> 
> It's also possible to change the order of the buffer sending, by starting with
> the bigger one (D3D11_VIDEO_DECODER_BUFFER_BITSTREAM). But it seems
> to have little influence, regardless if we wait for buffer submission or not.
> 
> The results are consistent between integrated GPU and dedicated GPU.

Hi Steven,

A while ago I had extended D3D11VA implementation to support single 
(non-array textures) for interoperability with Intel QSV+DX11.

I noticed a few bottlenecks making D3D11VA significantly slower than DXVA2.

The solution was to use ID3D10Multithread_SetMultithreadProtected and
remove all the locks which are currently applied.

Hence, I don't think that your patch is the best possible way .

Regards,
softworkz




More information about the ffmpeg-devel mailing list