[FFmpeg-devel] [PATCH] Stop using _explicit atomic operations where not necessary.

Andreas Rheinhardt andreas.rheinhardt at outlook.com
Sun Jun 6 14:39:33 EEST 2021


James Almer:
> On 6/5/2021 11:27 AM, Anton Khirnov wrote:
>> Memory ordering constraints other than the default (sequentially
>> consistent) can behave in very unintuitive and unexpected ways, and so
>> should be avoided unless there is a strong (typically performance)
>> reason for using them. This holds especially for memory_order_relaxed,
>> which imposes no ordering constraints.
> 
> Performance is important for FIFO, though. Could they use a laxer order
> at least for the fetch_sub and fetch_add cases?
> 
> Also, commit fed50c4304e, which made cpu_flags an atomic type, states
> that ThreadSanitizer stopped reporting races on tests/cpu_init after
> that change, so maybe memory order is not important for it, only atomicity.

This is true: There can by definition be no data race involving an
atomic variable that is exclusively accessed via atomic operations: "The
execution of a program contains a data race if it contains two
conflicting actions in different threads, at least one of which is *not
atomic*, and neither happens before the other." (C11, 5.1.2.4.25)

Therefore fed50c4304eecb352e29ce789cdb96ea84d6162f and
a2a38b160620d91bc3f895dadc4501c589998b9c fixed data races (the latter
also a race condition); using sequentially consistent ordering meanwhile
brings no gain: It may provide synchronization between two threads
calling e.g. av_get_cpu_flags(), but the only effect of this is that
tsan might overlook buggy code that actually contains races. It notably
does not really establish an ordering: If two threads call
av_cpu_count() concurrently, then it is indetermined which of the two
threads will call av_log (if it has not already been called), but with
this patch it is certain that the second thread synchronizes with the
first thread and sees everything that the first thread has done so far
(which can mask bugs); and if we look at the test in
libavutil/tests/cpu_init.c, then all the possibilities that existed with
memory_order_relaxed also exist with memory_order_seq_cst:
 - Thread 1 can read first, see that cpu_flags is uninitialized and
initialize it, followed by thread 2 seeing and reusing the already
initialized value.
 - Thread 1 can read first, followed by thread 2 reading the value (with
both threads seeing that cpu_flags is uninitialized), followed by thread
1 storing its value, followed by thread 2 overwriting the value again;
the order of writes can also be the other way around.
 - Same as the last one, but with the order of stores reversed.
 - Exchanging the roles of thread 1 and thread 2 in the above scenarios
leads to the remaining cases.

> In any case these cpu and mem functions are hardly a bottleneck
> anywhere, and rarely called outside of debugging, so this change is ok.
> 
Mem functions are rarely called outside of debugging?

>> ---
>>   libavformat/fifo.c |  8 ++++----

The threading in fifo.c is weird: A variable (overflow_flag) that is
guarded by a mutex is marked as volatile; moreover it appears that one
could avoid said mutex altogether by making overflow_flag atomic. And
finally, the worker thread is only destroyed in write_trailer, but if
this is never called, then it will never be destroyed. Furtermore, in
this case the worker thread might wait on a condition variable that is
destroyed in av_thread_message_queue_free(), which is undefined behaviour.
I don't know whether the accesses to queue_duration need a
memory_ordering different than memory_order_relaxed.

>>   libavutil/cpu.c    |  8 ++++----
>>   libavutil/mem.c    | 10 +++++-----
>>   3 files changed, 13 insertions(+), 13 deletions(-)
>>
>> diff --git a/libavformat/fifo.c b/libavformat/fifo.c
>> index 50656f78b7..2ac14207b8 100644
>> --- a/libavformat/fifo.c
>> +++ b/libavformat/fifo.c
>> @@ -185,7 +185,7 @@ static int
>> fifo_thread_write_packet(FifoThreadContext *ctx, AVPacket *pkt)
>>       int ret, s_idx;
>>         if (fifo->timeshift && pkt->dts != AV_NOPTS_VALUE)
>> -        atomic_fetch_sub_explicit(&fifo->queue_duration,
>> next_duration(avf, pkt, &ctx->last_received_dts), memory_order_relaxed);
>> +        atomic_fetch_sub(&fifo->queue_duration, next_duration(avf,
>> pkt, &ctx->last_received_dts));
>>         if (ctx->drop_until_keyframe) {
>>           if (pkt->flags & AV_PKT_FLAG_KEY) {
>> @@ -454,7 +454,7 @@ static void *fifo_consumer_thread(void *data)
>>               av_log(avf, AV_LOG_INFO, "FIFO queue flushed\n");
>>             if (fifo->timeshift)
>> -            while (atomic_load_explicit(&fifo->queue_duration,
>> memory_order_relaxed) < fifo->timeshift)
>> +            while (atomic_load(&fifo->queue_duration) < fifo->timeshift)
>>                   av_usleep(10000);
>>             ret = av_thread_message_queue_recv(queue, &msg, 0);
>> @@ -594,7 +594,7 @@ static int fifo_write_packet(AVFormatContext *avf,
>> AVPacket *pkt)
>>       }
>>         if (fifo->timeshift && pkt && pkt->dts != AV_NOPTS_VALUE)
>> -        atomic_fetch_add_explicit(&fifo->queue_duration,
>> next_duration(avf, pkt, &fifo->last_sent_dts), memory_order_relaxed);
>> +        atomic_fetch_add(&fifo->queue_duration, next_duration(avf,
>> pkt, &fifo->last_sent_dts));
>>         return ret;
>>   fail:
>> @@ -621,7 +621,7 @@ static int fifo_write_trailer(AVFormatContext *avf)
>>               } else {
>>                   now += delay;
>>               }
>> -            atomic_fetch_add_explicit(&fifo->queue_duration, delay,
>> memory_order_relaxed);
>> +            atomic_fetch_add(&fifo->queue_duration, delay);
>>               elapsed += delay;
>>               if (elapsed > fifo->timeshift)
>>                   break;
>> diff --git a/libavutil/cpu.c b/libavutil/cpu.c
>> index 8960415d00..8a6af81ae4 100644
>> --- a/libavutil/cpu.c
>> +++ b/libavutil/cpu.c
>> @@ -89,15 +89,15 @@ void av_force_cpu_flags(int arg){
>>           arg |= AV_CPU_FLAG_MMX;
>>       }
>>   -    atomic_store_explicit(&cpu_flags, arg, memory_order_relaxed);
>> +    atomic_store(&cpu_flags, arg);
>>   }
>>     int av_get_cpu_flags(void)
>>   {
>> -    int flags = atomic_load_explicit(&cpu_flags, memory_order_relaxed);
>> +    int flags = atomic_load(&cpu_flags);
>>       if (flags == -1) {
>>           flags = get_cpu_flags();
>> -        atomic_store_explicit(&cpu_flags, flags, memory_order_relaxed);
>> +        atomic_store(&cpu_flags, flags);
>>       }
>>       return flags;
>>   }
>> @@ -221,7 +221,7 @@ int av_cpu_count(void)
>>       nb_cpus = sysinfo.dwNumberOfProcessors;
>>   #endif
>>   -    if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed))
>> +    if (!atomic_exchange(&printed, 1))
>>           av_log(NULL, AV_LOG_DEBUG, "detected %d logical cores\n",
>> nb_cpus);
>>         return nb_cpus;
>> diff --git a/libavutil/mem.c b/libavutil/mem.c
>> index a52d33d4a6..c216c0314f 100644
>> --- a/libavutil/mem.c
>> +++ b/libavutil/mem.c
>> @@ -72,14 +72,14 @@ void  free(void *ptr);
>>   static atomic_size_t max_alloc_size = ATOMIC_VAR_INIT(INT_MAX);
>>     void av_max_alloc(size_t max){
>> -    atomic_store_explicit(&max_alloc_size, max, memory_order_relaxed);
>> +    atomic_store(&max_alloc_size, max);
>>   }
>>     void *av_malloc(size_t size)
>>   {
>>       void *ptr = NULL;
>>   -    if (size > atomic_load_explicit(&max_alloc_size,
>> memory_order_relaxed))
>> +    if (size > atomic_load(&max_alloc_size))
>>           return NULL;
>>     #if HAVE_POSIX_MEMALIGN
>> @@ -135,7 +135,7 @@ void *av_malloc(size_t size)
>>   void *av_realloc(void *ptr, size_t size)
>>   {
>>       void *ret;
>> -    if (size > atomic_load_explicit(&max_alloc_size,
>> memory_order_relaxed))
>> +    if (size > atomic_load(&max_alloc_size))
>>           return NULL;
>>     #if HAVE_ALIGNED_MALLOC
>> @@ -489,7 +489,7 @@ void *av_fast_realloc(void *ptr, unsigned int
>> *size, size_t min_size)
>>       if (min_size <= *size)
>>           return ptr;
>>   -    max_size = atomic_load_explicit(&max_alloc_size,
>> memory_order_relaxed);
>> +    max_size = atomic_load(&max_alloc_size);
>>         if (min_size > max_size) {
>>           *size = 0;
>> @@ -521,7 +521,7 @@ static inline void fast_malloc(void *ptr, unsigned
>> int *size, size_t min_size, i
>>           return;
>>       }
>>   -    max_size = atomic_load_explicit(&max_alloc_size,
>> memory_order_relaxed);
>> +    max_size = atomic_load(&max_alloc_size);
>>         if (min_size > max_size) {
>>           av_freep(ptr);
>>
> 


More information about the ffmpeg-devel mailing list