[FFmpeg-devel] [PATCH 03/14] pthread_frame: merge the functionality for normal decoder init and init_thread_copy

Thu Apr 2 00:35:13 EEST 2020

On 3/31/20 2:47 AM, Anton Khirnov wrote:
> Quoting David Bryant (2020-03-28 21:22:40)
>> On 3/28/20 6:23 AM, Anton Khirnov wrote:
>>> Quoting David Bryant (2020-03-27 23:51:19)
>>>> On 3/27/20 5:57 AM, Anton Khirnov wrote:
>>>>> The current design, where
>>>>> - proper init is called for the first per-thread context
>>>>> - first thread's private data is copied into private data for all the
>>>>>   other threads
>>>>> - a "fixup" function is called for all the other threads to e.g.
>>>>>   allocate dynamically allocated data
>>>>> is very fragile and hard to follow, so it is abandoned. Instead, the
>>>>> same init function is used to init each per-thread context. Where
>>>>> necessary, AVCodecInternal.is_copy can be used to differentiate between
>>>>> the first thread and the other ones (e.g. for decoding the extradata
>>>>> just once).
>>>>> ---
>>>> I'm not sure I fully understand this change. You mention that AVCodecInternal.is_copy can be used where different treatment
>>>> might be necessary for subsequent threads, and I see that done in a couple places, but in most cases you have simply deleted the
>>>> init_thread_copy() function even when it's not at all obvious (to me anyway) that that won't break the codec.
>>> In most cases, just deleting init_thread_copy() is the right thing to
>>> do. E.g. all decoders that do not implement update_thread_context() have
>>> to be intra-only, so every frame thread is effectively a completely
>>> standalone decoder. And in most of the more complex decoders (like h264)
>>> the important parameters are dynamically changeable during decoding, so
>>> not that much is done in decoder init beyond allocating some stuff that
>>> does not depend on the bistream properties.
>>>
>>> My intent is for each frame-thread worker to be treated inasmuch as
>>> possible as a standalone decoder, and where it has to share data with
>>> other threads to make this sharing explicit (rather than implicit as is
>>> the case now).
>> Yes, this makes sense. The confusing part is when the decode_init() function looks completely different than the
>> init_thread_copy() function. This is often because the decode_init() function is generating things (tables, etc.) from scratch
>> and the init_thread_copy() is just copying the necessary things from the original. In cases where the original generation might
>> be time consuming this might make sense, but usually it's probably just making the code more complex and difficult to follow
>> (which I believe was your original point).
> Yes, precisely. I believe for obscure codecs it's better to waste a
> little memory when doing threaded decoding, in return for simpler code.
> And if people really care about being as memory-efficient as possible
> then doing explicit refcounting on allocated buffers is still possible.
>
>> One possible interim solution for complex cases that break would be to leave the init_thread_copy() function there, but instead
>> of having it in the AVCodec struct and called from outside (which is no longer possible), it simply gets called first thing from
>> decode_init() if AVCodecInternal.is_copy is set. That way the architecture is cleaned up now, and the codec won't break and can
>> be cleaned up when time permits. Just a thought.
> I don't really see much point in any interim solutions, since no codecs
> should break. You found a problem with wavpack, that's going to be fixed
> in the next iteration of this patch.
> Speaking of which, ideally we should have a FATE test for DSD files so
> that I would have found the failure myself when running test.

I have created a WavPack DSD test file that exercises all three of the DSD modes (fast, high, and copy). I have also verified
that it would have caught this change, although I had to specify fate multithreading once I discovered it wasn't the default. If
someone can add this to the fate suite (in the wavpack/lossless folder) then I can supply a patch to add the test:

file: http://www.wavpack.com/dsd.wv (MD5: 74b2181f3e9829d9a5b98edd037984ac)
decoded MD5 (f32le): a3a88bba95f809025dce01ecb9064091

>>>> Specifically this will break WavPack because now it will be allocating a new dsdctx for every thread instead of sharing one
>>>> between all threads. I am happy to fix and test this case once the patch goes in, but is the intent of this patch that the
>>>> respective authors will find and fix all the breakages? Or did I just happen to catch the one case you missed?
>>> I certainly intended to convert the decoders correctly myself,
>>> apparently I didn't pay enough attention to the recent wavpack changes.
>>> Hopefully the other conversions are correct, but I will go through the
>>> changes again to check.
>>>
>>> Looking at wavpack, the code looks suspicious to me. You allocate one
>>> dsdctx per channel at init, but AFAIU the number of channels can change
>>> at each frame.
>>>
>> Multichannel WavPack consists of sequences of stereo or mono WavPack blocks that include flags indicating whether they are the
>> beginning or end of a "frame" or "packet". This allows the number of channels available to be essentially unlimited, however the
>> total number of channels in a stream may not change. When decoding, if a sequence of blocks generates more or less than the
>> correct number of channels, this is flagged as an error and overrun is prevented (see libavcodec/wavpack.c, line 1499 and line
>> 1621).
> If my reading of the code is correct, line 1499 only checks for overflow
> of the channel count read for the current sequence. That channel count
> is exported to AVCodeContext in the block starting on line 1464 and
> I don't see any checks for whether it is equal to the initial one.
>
Full disclosure: I did not write the WavPack decoder. I understand _how_ it works, but I don't necessarily understand _why_ it
was written the way it was due to gaps in my knowledge of the FFmpeg internals, and there are certainly things in there that
don't look quite right to me.

That said, I have tested it thoroughly and it handles everything I throw at it, and I have extended it to handle the DSD mode
without anything unexpected happening. And of course it seems to handle extensive fuzzing, and the recent error was related to
my DSD change, suggesting that the core it pretty robust. My inclination has been to not "fix" anything that wasn't broken.

Yes, that channel count stuff is a little difficult to follow. I could experiment with streams that abruptly change channel
configuration, but again I'm hesitant to change code that's worked so well so long.

Kind regards,

David