[FFmpeg-devel] [PATCH] threadprogress: reorder instructions to silence tsan warning.
Zhao Zhili
quinkblack at foxmail.com
Fri Feb 7 13:53:22 EET 2025
> On Feb 7, 2025, at 19:46, Zhao Zhili <quinkblack-at-foxmail.com at ffmpeg.org> wrote:
>
>
>
>> On Feb 7, 2025, at 19:39, Andreas Rheinhardt <andreas.rheinhardt at outlook.com> wrote:
>>
>> Andreas Rheinhardt:
>>> Ronald S. Bultje:
>>>> Fixes #11456.
>>>> ---
>>>> libavcodec/threadprogress.c | 3 +--
>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>
>>>> diff --git a/libavcodec/threadprogress.c b/libavcodec/threadprogress.c
>>>> index 62c4fd898b..aa72ff80e7 100644
>>>> --- a/libavcodec/threadprogress.c
>>>> +++ b/libavcodec/threadprogress.c
>>>> @@ -55,9 +55,8 @@ void ff_thread_progress_report(ThreadProgress *pro, int n)
>>>> if (atomic_load_explicit(&pro->progress, memory_order_relaxed) >= n)
>>>> return;
>>>>
>>>> - atomic_store_explicit(&pro->progress, n, memory_order_release);
>>>> -
>>>> ff_mutex_lock(&pro->progress_mutex);
>>>> + atomic_store_explicit(&pro->progress, n, memory_order_release);
>>>> ff_cond_broadcast(&pro->progress_cond);
>>>> ff_mutex_unlock(&pro->progress_mutex);
>>>> }
>>>
>>> I don't really understand why this is supposed to fix a race; after all,
>>> the synchronisation of ff_thread_progress_(report|await) is not supposed
>>> to be provided by the mutex (which is avoided altogether in the fast
>>> path in ff_thread_report_await()), but by storing and loading the
>>> progress variable.
>>> That's also the reason why I moved this outside of the mutex (compared
>>> to ff_thread_report_progress(). (This way it is possible for a consumer
>>> thread to see the new progress value earlier and possibly avoid the
>>> mutex altogether.)
>>>
>>
>> Damn, this optimization works, but only if the progress variable is
>> always read with acquire-semantics; it is currently read via
>> memory_order_relaxed inside the mutex (just like in
>> ff_thread_await_progress()).
>>
>> According to my understanding, this is what happens:
>> Consumer thread waits for progress and finds that it is insufficient
>> (fast path fails)
>> Producer thread updates progress variable
>> Consumer thread acquires the mutex and reads new progress via
>> memory_order_relaxed
>> Producer thread acquires mutex and broadcasts the new progress
>>
>> I'd prefer to change these semantics so that we always perform
>> synchronisation via the atomic progress variable (unless you know of a
>> performance impact -- I only know that on x86, both memory_order_relaxed
>> and memory_order_acquire are ordinary loads).
>
> I have considered the solution too, by always use memory_order_acquire
> in wait progress. memory_order_relaxed is normal load on ARM, while
> memory_order_acquire isn’t. So there is real difference.
>
> https://developer.arm.com/documentation/dui0801/l/A64-Data-Transfer-Instructions/LDAPR--A64-
>
> Now it’s weird to use memory_order_acquire inside mutex lock.
cc Remi, who have written VLC atomic_wait and mutex from sketch.
>
>>
>> Thanks for looking into this.
>>
>> - Andreas
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list