[FFmpeg-devel] Fwd: [PATCH] avcodec: allow hardware decoding with multithread for FFmpeg

Thu Jul 28 11:02:20 EEST 2016

On Thu, Jul 28, 2016 at 9:39 AM, Stève Lhomme <robux4 at videolabs.io> wrote:
> On Wed, Jul 27, 2016 at 7:58 PM, Hendrik Leppkes <h.leppkes at gmail.com> wrote:
>> On Wed, Jul 27, 2016 at 6:43 PM, Stève Lhomme <robux4 at videolabs.io> wrote:
>>> Hello fellow FFmpegers,
>>>
>>> Is there still an issue with hardware decoding when combined with
>>> multithread ? It seems to work fine on our Windows build. Although we
>>> have a mutex in place in the D3D11 variant of the code that may help.
>>> It mostly protects the video context...
>>>
>>> If necessary we can have the same trick for DXVA2 if there are still
>>> known issues.
>>>
>>
>> ffmpeg.c just produces corrupt decoding when you try to use that, so
>> use at your own peril.
>
> Is it random or pretty easy to get the issue ? I'd certainly like to
> understand the issue better.

Its really simple: If you lock a frame for read-back while its being
used as a reference frame by the GPU, the decoding on the GPU fails.
You could try to protect every access with a global lock, I guess, but
thats really a rather ugly solution and doesn't really give any
benefits over just not threading (see below for why its not faster
anyway)

This can be instantly reproduced on DXVA2 at least with both Intel and
NVIDIA GPUs (do not know about AMD, don't have any handy).
I'm not sure what happens on Linux, some people said its safer there
due to different ways to copy the data off the surfaces not being a
full lock, at least in the way ffmpeg.c implements it.

>
> With D3D11 we were using the decoder context in the decoder threads
> and the vout thread. In debug builds Windows was complaining with
> various warnings about that. On some phones it was even crashing.
> After protecting the context in the libavcodec and the vout everything
> seems to work fine. Is there anything else that would explain issues ?
> Maybe Linux or Mac don't detect such cross accesses and that's how
> corruption happens ?
>
>> Not to mention that it doesn't offer any performance benefits.
>
> libavcodec can decode multiple frames in parallel why couldn't the GPU
> do it as well ? It would be odd that GPU based solutions are inferior
> to CPU based ones.

GPUs also decode multiple frames at once, which is why performance
suffers when you instantly try to lock them after getting them out of
the decoder (because it needs to sync first), however the GPU does
this silently behind the back in an async fashion, you don't need CPU
threads to feed it simultaneously.

There is a bit of misconception because threaded decoding through
ffmpeg.c *looks* faster right now on some systems, but that comes back
to the forced sync mentioned above. Threads add an inherent delay
until the frame is returned to the caller, and as such gives the GPU
more time to asynchroniously finish the frame. You can fix that
without threads by simply delaying locking the frames by about 2
frames, and get full speed then.

On top of that threading requires allocating more frame surfaces just
for the way threading works in avcodec (one per thread at least, in
extreme cases 16 per thread), so it even costs resources without
gaining anything.
Not to mention that handling threaded hwaccel in avcodec has caused a
variety of bugs in the past, which are hopefully fixed now, but its
still rather ugly due to that.

>
> Last but not least, why is it considered ok to use multiple threads in
> libav and not in FFmpeg ? The code is pretty much the same for GPU
> decoding.
>

Libav runs into the same corruption issues, they just never cared to
do anything about that.

- Hendrik