[FFmpeg-devel] [PATCH] threadprogress: reorder instructions to silence tsan warning.

Fri Feb 7 13:53:22 EET 2025

> On Feb 7, 2025, at 19:46, Zhao Zhili <quinkblack-at-foxmail.com at ffmpeg.org> wrote:
> 
> 
> 
>> On Feb 7, 2025, at 19:39, Andreas Rheinhardt <andreas.rheinhardt at outlook.com> wrote:
>> 
>> Andreas Rheinhardt:
>>> Ronald S. Bultje:
>>>> Fixes #11456.
>>>> ---
>>>> libavcodec/threadprogress.c | 3 +--
>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>> 
>>>> diff --git a/libavcodec/threadprogress.c b/libavcodec/threadprogress.c
>>>> index 62c4fd898b..aa72ff80e7 100644
>>>> --- a/libavcodec/threadprogress.c
>>>> +++ b/libavcodec/threadprogress.c
>>>> @@ -55,9 +55,8 @@ void ff_thread_progress_report(ThreadProgress *pro, int n)
>>>>    if (atomic_load_explicit(&pro->progress, memory_order_relaxed) >= n)
>>>>        return;
>>>> 
>>>> -    atomic_store_explicit(&pro->progress, n, memory_order_release);
>>>> -
>>>>    ff_mutex_lock(&pro->progress_mutex);
>>>> +    atomic_store_explicit(&pro->progress, n, memory_order_release);
>>>>    ff_cond_broadcast(&pro->progress_cond);
>>>>    ff_mutex_unlock(&pro->progress_mutex);
>>>> }
>>> 
>>> I don't really understand why this is supposed to fix a race; after all,
>>> the synchronisation of ff_thread_progress_(report|await) is not supposed
>>> to be provided by the mutex (which is avoided altogether in the fast
>>> path in ff_thread_report_await()), but by storing and loading the
>>> progress variable.
>>> That's also the reason why I moved this outside of the mutex (compared
>>> to ff_thread_report_progress(). (This way it is possible for a consumer
>>> thread to see the new progress value earlier and possibly avoid the
>>> mutex altogether.)
>>> 
>> 
>> Damn, this optimization works, but only if the progress variable is
>> always read with acquire-semantics; it is currently read via
>> memory_order_relaxed inside the mutex (just like in
>> ff_thread_await_progress()).
>> 
>> According to my understanding, this is what happens:
>> Consumer thread waits for progress and finds that it is insufficient
>> (fast path fails)
>> Producer thread updates progress variable
>> Consumer thread acquires the mutex and reads new progress via
>> memory_order_relaxed
>> Producer thread acquires mutex and broadcasts the new progress
>> 
>> I'd prefer to change these semantics so that we always perform
>> synchronisation via the atomic progress variable (unless you know of a
>> performance impact -- I only know that on x86, both memory_order_relaxed
>> and memory_order_acquire are ordinary loads).
> 
> I have considered the solution too, by always use memory_order_acquire
> in wait progress. memory_order_relaxed is normal load on ARM, while
> memory_order_acquire isn’t. So there is real difference.
> 
> https://developer.arm.com/documentation/dui0801/l/A64-Data-Transfer-Instructions/LDAPR--A64-
> 
> Now it’s weird to use memory_order_acquire inside mutex lock.

cc Remi, who have written VLC atomic_wait and mutex from sketch.

> 
>> 
>> Thanks for looking into this.
>> 
>> - Andreas
>> 
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>> 
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".