-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Does the following litmus test exhibit undefined behavior according to the C++23 memory consistency model?
C lb-lna-faddrel-lrlx-lacq-sna
{ [x] = 0; [y] = 0; }
P0 (int* x, int* y) {
int a = *y;
atomic_fetch_add_explicit(x, 1, memory_order_release);
}
P1 (int* x, int* y) {
int b = atomic_load_explicit(x, memory_order_relaxed);
int c = atomic_load_explicit(x, memory_order_acquire);
if (b != 0) {
*y = 1;
}
}
~exists(0:a=1)I think it does.
However, herd with cpp11.cat/cpp17.cat reports it as well-defined (no Undef).
Adding an extra thread causes herd to report Undef.
C lb-lna-faddrel-lrlx-lacq-sna-srlx
{ [x] = 0; [y] = 0; }
P0 (int* x, int* y) {
int a = *y;
atomic_fetch_add_explicit(x, 1, memory_order_release);
}
P1 (int* x, int* y) {
int b = atomic_load_explicit(x, memory_order_relaxed);
int c = atomic_load_explicit(x, memory_order_acquire);
if (b != 0) {
*y = 1;
}
}
P2 (int* y) {
atomic_store_explicit(x, 0, memory_order_relaxed);
}
~exists(0:a=1)Pros and Cons of acquire sequences
What optimizations does lack of acquire fences enable?
Since an acquire-fence may precede any dead atomic load:
atomic_thread_fence(memory_order_acquire);
....much later....
atomic_load_explicit(x, memory_order_relaxed); // is there an acquire-fence before this dead-load? can't know in generalcompiler cannot, in general (i.e. without proving absence of acquire-fences), remove dead atomic loads as part of C++ -> C++ transformations. That is, lack of acquire fences does not, in general, enable compilers to remove dead acquire loads.
What optimizations do acquire fences enable?
Acquire fences enable C++ -> C++ transformations for common concurrency idioms that result in less total number of acquire operations in a program. For example, consider the common case of polling on a flag to then read some data (MP):
while (atomic_load_explicit(flag, memory_order_acquire) == 0);
auto data = *x;Acquire fences make the following transformation sound:
while (atomic_load_explicit(flag, memory_order_relaxed) == 0);
atomic_load_explicit(flag, memory_order_acquire); // 1x acquire op: acquire-pattern with preceding flag loads only
auto data = *x;This transformation is slightly better than:
while (atomic_load_explicit(x, memory_order_relaxed) == 0);
atomic_thread_fence(memory_order_acquire); // 1x acquire fence: acquire-pattern with all preceding atomic loads
auto data = *x;