[FFmpeg-devel] [PATCH] threadprogress: reorder instructions to silence tsan warning.

Sat Feb 8 21:31:11 EET 2025

Hi,

On Fri, Feb 7, 2025 at 10:50 PM Zhao Zhili <
quinkblack-at-foxmail.com at ffmpeg.org> wrote:

>
>
> > On Feb 8, 2025, at 00:05, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> >
> > Hi,
> >
> > On Fri, Feb 7, 2025 at 8:44 AM Zhao Zhili <
> > quinkblack-at-foxmail.com at ffmpeg.org> wrote:
> >
> >>> On Feb 7, 2025, at 21:26, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> >>> On Fri, Feb 7, 2025 at 6:22 AM Andreas Rheinhardt <
> >>> andreas.rheinhardt at outlook.com> wrote:
> >>>> Ronald S. Bultje:
> >>>>> Fixes #11456.
> >>>>> ---
> >>>>> libavcodec/threadprogress.c | 3 +--
> >>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/libavcodec/threadprogress.c
> b/libavcodec/threadprogress.c
> >>>>> index 62c4fd898b..aa72ff80e7 100644
> >>>>> --- a/libavcodec/threadprogress.c
> >>>>> +++ b/libavcodec/threadprogress.c
> >>>>> @@ -55,9 +55,8 @@ void ff_thread_progress_report(ThreadProgress *pro,
> >>>> int n)
> >>>>>    if (atomic_load_explicit(&pro->progress, memory_order_relaxed) >=
> >> n)
> >>>>>        return;
> >>>>>
> >>>>> -    atomic_store_explicit(&pro->progress, n, memory_order_release);
> >>>>> -
> >>>>>    ff_mutex_lock(&pro->progress_mutex);
> >>>>> +    atomic_store_explicit(&pro->progress, n, memory_order_release);
> >>>>>    ff_cond_broadcast(&pro->progress_cond);
> >>>>>    ff_mutex_unlock(&pro->progress_mutex);
> >>>>> }
> >>>>
> >>>> I don't really understand why this is supposed to fix a race; after
> all,
> >>>> the synchronisation of ff_thread_progress_(report|await) is not
> supposed
> >>>> to be provided by the mutex (which is avoided altogether in the fast
> >>>> path in ff_thread_report_await()), but by storing and loading the
> >>>> progress variable.
> >>>> That's also the reason why I moved this outside of the mutex (compared
> >>>> to ff_thread_report_progress(). (This way it is possible for a
> consumer
> >>>> thread to see the new progress value earlier and possibly avoid the
> >>>> mutex altogether.)
> >>>
> >>>
> >>> The consumer thread already checks the value without the lock. so the
> >>> significance of that last point seems minor to me. This would be
> >> different
> >>> if the wait() counterpart had no lockless path. Or am I missing
> >> something?
> >>
> >> What Andreas says is atomic_store before mutex_lock makes the first
> >> atomic_load in progress_wait has a higher chance to succeed. The earlier
> >> progress is set, the higher chance of progress_wait go into the fast
> path.
> >
> >
> > I understand that is true in theory - but I have doubts on whether this
> is
> > in any way significant in practice if wait() already has behaviour to
> > pre-empty locklessly
> >
> > I measured this in the most un-scientific way possible by decoding
> > gizmo.webm (from Firefox' bug report) 10x before and after my patch,
> taking
> > the average and standard deviation, and comparing these with each other.
> I
> > repeated this a couple of times. The values (before vs after avg +/-
> > stddev) are obviously never exactly the same, but they swarm around each
> > other like a random noise generator. Or to say it differently: in my
> highly
> > unscientific test, I see no performance difference.
> >
> > So ... Is this really worth it?
>
> I did another test by measure fast_path / (fast_path + slow_path) on macOS
> of hevc decoding with 10 threads.
>
> 1. Before the patch, it’s 99.741%.
> 2. With the patch, it’s 99.743%.
> 3. With while (atomic_load_explicit(&pro->progress, memory_order_acquire)
> < n), it’s 99.741%.
>
> So, it doesn’t matter for performance. Current patch is the most elegant
> solution in my opinion.

Thanks for testing. Andreas, any further thoughts?

Ronald