[FFmpeg-devel] [PATCH v3 1/2] dxva: wait until D3D11 buffer copies are done before submitting them

Steve Lhomme robux4 at ycbcr.xyz
Wed Aug 12 15:04:41 EEST 2020


On 2020-08-11 12:43, Steve Lhomme wrote:
>>> Sorry if you seem to know all the answers already, but I don't and so 
>>> I have to
>>> investigate.
>>
>> Last year, I had literally worked this down to death. I followed every 
>> slightest
>> hint from countless searches, read through hundreds of discussions, 
>> driven
>> because I was unwilling to believe that up-/downloading of video 
>> textures with
>> D3D11 can't be done equally fast as with D3D9.
>> (the big picture was the implementation of D3D11 support for QuickSync 
>> where
>> the slowdown played a much bigger role than with D3D11VA decoders only).
>> Eventually I landed at some internal Nvidia presentation, some talks 
>> with MS
>> guys and some source code discussion deep inside a 3D game engine (not a
>> no-name). It really bugs me that I didn't properly note the 
>> references, but
>> from somewhere in between I was able to gather solid evidence about what
>> is legal to do and what Is not. Based on that, followed several 
>> iterations to
>> find the optimal way for doing the texture transfer. As I had implemented
>> D3D11 support for QuickSync, this got pretty complicated because with
>> a full transcoding pipeline, all parts (decoder, encoder and filters) 
>> can (and
>> usually will) request textures. Only the latest Intel Drivers can work 
>> with
>> array textures everywhere (e.g. VPP), so I also needed to add support for
>> non-array texture allocation. The patch you've seen is the result of 
>> weeks
>> of intensive work (a small but crucial part of it) - even when it may not
>> look like that.
>>
>>
>>> Sorry if you seem to know all the answers already
>>
>> Obviously, I don't know all the answers, but all the answers I have given
>> were correct. And when I didn't have an answer I always respectfully
>> said that your situation might be different.
>> And I didn't reply by implying that you would have done your work
>> by trial-and-error or most likely invalid assumptions or deductions.
>>
>>
>> I still don't know how you are actually operating this and thus I also 
>> cannot
>> tell what might or might not work in your case.
>> All I can tell is that the procedure that I have described (1-2-3-4) can
>> work rock-solid for multi-threaded DX11 texture transfer when it's 
>> done in
>> the same way as I've shown.
>> And believe it or not - I would still be happy when it would be
>> of any use for you...
> 
> Even though the discussion is heated (fitting with the weather here) I 
> don't mind. I learned some stuff and it pushed me to dig deeper. I can't 
> just accept your word for it. I need something solid if I'm going to 
> remove a lock that helped me so far.
> 
> So I'm currently tooling VLC to be able to bring the decoder to its 
> knees and find out what it can and cannot do safely. So far I can still 
> see decoding artifacts when I don't a lock, which would mean I still 
> need the mutex, for the reasons given in the previous mail.

A follow-up on this. Using ID3D10Multithread seems to be enough to have 
mostly thread safe ID3D11Device/ID3D11DeviceContext/etc. Even the 
decoding with its odd API seem to know what to do when submitted 
different buffers.

I did not manage to saturate the GPU but I much bigger decoding 
speed/throughput to validate the errors I got before. Many of them were 
due to VLC dropping data because of odd timing.

Now I still have some threading issues. For example for deinterlacing we 
create a ID3D11VideoProcessor to handle the deinterlacing. And we create 
it after the decoding started (as the deinterlacing can be 
enabled/disabled dynamically). Without the mutex in the decoder it 
crashes on ID3D11VideoDevice::CreateVideoProcessor() and 
ID3D11VideoContext::SubmitDecoderBuffers() as they are being called 
simultaneously. If I add the mutex between the decoder and just this 
filter (not the rendering side) it works fine.

So I guess I'm stuck with the mutex for the time being.

Here is the stack trace on an Intel 630 GPU:

igd11dxva64.dll!00007ffc384a8d24() (Unknown Source:0)
igd11dxva64.dll!00007ffc38452030() (Unknown Source:0)
igd11dxva64.dll!00007ffc3845a081() (Unknown Source:0)
igd11dxva64.dll!00007ffc38465a27() (Unknown Source:0)
igd11dxva64.dll!00007ffc386067d2() (Unknown Source:0)
igd11dxva64.dll!00007ffc3883c9f3() (Unknown Source:0)
igd11dxva64.dll!00007ffc3867145a() (Unknown Source:0)
igd11dxva64.dll!00007ffc3866ea23() (Unknown Source:0)
igd11dxva64.dll!00007ffc3881b4ac() (Unknown Source:0)
igd11dxva64.dll!00007ffc384f7bdc() (Unknown Source:0)
igd11dxva64.dll!00007ffc384fa2a5() (Unknown Source:0)
igd11dxva64.dll!00007ffc3847a334() (Unknown Source:0)
d3d11.dll!00007ffcabc33e8d() (Unknown Source:0)
d3d11.dll!00007ffcabc3389d() (Unknown Source:0)
d3d11_3SDKLayers.dll!00007ffc3184fa6b() (Unknown Source:0)
   calling ID3D11VideoContext::SubmitDecoderBuffers()
libavcodec_plugin.dll!ff_dxva2_common_end_frame(AVCodecContext * avctx, 
AVFrame * frame, const void * pp, unsigned int pp_size, const void * qm, 
unsigned int qm_size, int(*)(AVCodecContext *, void *, void *) 
commit_bs_si) Line 1085 
(c:\Users\robux\Documents\Programs\Videolabs\build\win64\contrib\contrib-win64\ffmpeg\libavcodec\dxva2.c:1085)
libavcodec_plugin.dll!dxva2_h264_end_frame(AVCodecContext * avctx) Line 
507 
(c:\Users\robux\Documents\Programs\Videolabs\build\win64\contrib\contrib-win64\ffmpeg\libavcodec\dxva2_h264.c:507)
libavcodec_plugin.dll!ff_h264_field_end(H264Context * h, 
H264SliceContext * sl, int in_setup) Line 171 
(c:\Users\robux\Documents\Programs\Videolabs\build\win64\contrib\contrib-win64\ffmpeg\libavcodec\h264_picture.c:171)
libavcodec_plugin.dll!h264_decode_frame(AVCodecContext * avctx, void * 
data, int * got_frame, AVPacket * avpkt) Line 1015 
(c:\Users\robux\Documents\Programs\Videolabs\build\win64\contrib\contrib-win64\ffmpeg\libavcodec\h264dec.c:1015)
libavcodec_plugin.dll!decode_simple_internal(AVCodecContext * avctx, 
AVFrame * frame) Line 432 
(c:\Users\robux\Documents\Programs\Videolabs\build\win64\contrib\contrib-win64\ffmpeg\libavcodec\decode.c:432)

win32u.dll!00007ffcb0054784() (Unknown Source:0)
gdi32.dll!00007ffcb1e03860() (Unknown Source:0)
d3d11.dll!00007ffcabc756ee() (Unknown Source:0)
d3d11.dll!00007ffcabc5c811() (Unknown Source:0)
igd11dxva64.dll!00007ffc385c5043() (Unknown Source:0)
igd11dxva64.dll!00007ffc384abaa5() (Unknown Source:0)
igd11dxva64.dll!00007ffc384ab7ab() (Unknown Source:0)
igd11dxva64.dll!00007ffc38453b27() (Unknown Source:0)
igd11dxva64.dll!00007ffc384611e6() (Unknown Source:0)
igd11dxva64.dll!00007ffc385cca30() (Unknown Source:0)
igd11dxva64.dll!00007ffc384bb303() (Unknown Source:0)
igd11dxva64.dll!00007ffc3847ccff() (Unknown Source:0)
d3d11.dll!00007ffcabc3e661() (Unknown Source:0)
d3d11.dll!00007ffcabc3d39f() (Unknown Source:0)
d3d11.dll!00007ffcabc3d0cd() (Unknown Source:0)
d3d11.dll!00007ffcabc68a46() (Unknown Source:0)
d3d11.dll!00007ffcabc5955d() (Unknown Source:0)
d3d11_3SDKLayers.dll!00007ffc318a263c() (Unknown Source:0)
d3d11_3SDKLayers.dll!00007ffc3189479a() (Unknown Source:0)
d3d11_3SDKLayers.dll!00007ffc3184e749() (Unknown Source:0)
d3d11.dll!00007ffcabc59d0c() (Unknown Source:0)
d3d11.dll!00007ffcabc3c606() (Unknown Source:0)
d3d11_3SDKLayers.dll!00007ffc3187dd0e() (Unknown Source:0)
   calling ID3D11VideoDevice::CreateVideoProcessor()
libdirect3d11_filters_plugin.dll!D3D11OpenDeinterlace(vlc_object_t * 
obj) Line 297 
(c:\Users\robux\Documents\Programs\Videolabs\vlc\modules\hw\d3d11\d3d11_deinterlace.c:297)
libvlccore.dll!generic_start(void * func, bool forced, char * ap) Line 
294 
(c:\Users\robux\Documents\Programs\Videolabs\vlc\src\modules\modules.c:294)
libvlccore.dll!module_load(vlc_logger * log, module_t * m, int(*)(void 
*, bool, char *) init, bool forced, char * args) Line 212 
(c:\Users\robux\Documents\Programs\Videolabs\vlc\src\modules\modules.c:212)
libvlccore.dll!vlc_module_load(vlc_logger * log, const char * 
capability, const char * name, bool strict, int(*)(void *, bool, char *) 
probe, ...) Line 265 
(c:\Users\robux\Documents\Programs\Videolabs\vlc\src\modules\modules.c:265)
libvlccore.dll!module_need(vlc_object_t * obj, const char * cap, const 
char * name, bool strict) Line 305 
(c:\Users\robux\Documents\Programs\Videolabs\vlc\src\modules\modules.c:305)
libvlccore.dll!filter_chain_AppendInner(filter_chain_t * chain, const 
char * name, const char * capability, config_chain_t * cfg, const 
es_format_t * fmt_out) Line 254 
(c:\Users\robux\Documents\Programs\Videolabs\vlc\src\misc\filter_chain.c:254)
libvlccore.dll!filter_chain_AppendFilter(filter_chain_t * chain, const 
char * name, config_chain_t * cfg, const es_format_t * fmt_out) Line 299 
(c:\Users\robux\Documents\Programs\Videolabs\vlc\src\misc\filter_chain.c:299)
libvlccore.dll!ThreadChangeFilters(vout_thread_sys_t * vout, const char 
* filters, const bool * new_deinterlace, bool is_locked) Line 992 
(c:\Users\robux\Documents\Programs\Videolabs\vlc\src\video_output\video_output.c:992)
libvlccore.dll!Thread(void * object) Line 1891 
(c:\Users\robux\Documents\Programs\Videolabs\vlc\src\video_output\video_output.c:1891)
libvlccore.dll!vlc_entry(void * p) Line 360 
(c:\Users\robux\Documents\Programs\Videolabs\vlc\src\win32\thread.c:360)
msvcrt.dll!00007ffcb139af5a() (Unknown Source:0)
msvcrt.dll!00007ffcb139b02c() (Unknown Source:0)
kernel32.dll!00007ffcb21d6fd4() (Unknown Source:0)
ntdll.dll!00007ffcb23bcec1() (Unknown Source:0)


More information about the ffmpeg-devel mailing list