Skip to content

Acquire sequences #5

@gonzalobg

Description

@gonzalobg

Does the following litmus test exhibit undefined behavior according to the C++23 memory consistency model?

C lb-lna-faddrel-lrlx-lacq-sna
{ [x] = 0; [y] = 0; }

P0 (int* x, int* y) {
  int a = *y;
  atomic_fetch_add_explicit(x, 1, memory_order_release);
}

P1 (int* x, int* y) {
  int b = atomic_load_explicit(x, memory_order_relaxed);
  int c = atomic_load_explicit(x, memory_order_acquire);
  if (b != 0) {
    *y = 1;
  }
}

~exists(0:a=1)

I think it does.
However, herd with cpp11.cat/cpp17.cat reports it as well-defined (no Undef).
Adding an extra thread causes herd to report Undef.

C lb-lna-faddrel-lrlx-lacq-sna-srlx
{ [x] = 0; [y] = 0; }

P0 (int* x, int* y) {
  int a = *y;
  atomic_fetch_add_explicit(x, 1, memory_order_release);
}

P1 (int* x, int* y) {
  int b = atomic_load_explicit(x, memory_order_relaxed);
  int c = atomic_load_explicit(x, memory_order_acquire);
  if (b != 0) {
    *y = 1;
  }
}

P2 (int* y) {
  atomic_store_explicit(x, 0, memory_order_relaxed);
}

~exists(0:a=1)

Pros and Cons of acquire sequences

What optimizations does lack of acquire fences enable?

Since an acquire-fence may precede any dead atomic load:

atomic_thread_fence(memory_order_acquire);
....much later....
atomic_load_explicit(x, memory_order_relaxed); // is there an acquire-fence before this dead-load? can't know in general

compiler cannot, in general (i.e. without proving absence of acquire-fences), remove dead atomic loads as part of C++ -> C++ transformations. That is, lack of acquire fences does not, in general, enable compilers to remove dead acquire loads.

What optimizations do acquire fences enable?

Acquire fences enable C++ -> C++ transformations for common concurrency idioms that result in less total number of acquire operations in a program. For example, consider the common case of polling on a flag to then read some data (MP):

while (atomic_load_explicit(flag, memory_order_acquire) == 0);
auto data = *x;

Acquire fences make the following transformation sound:

while (atomic_load_explicit(flag, memory_order_relaxed) == 0);
atomic_load_explicit(flag, memory_order_acquire); // 1x acquire op: acquire-pattern with preceding flag loads only
auto data = *x;

This transformation is slightly better than:

while (atomic_load_explicit(x, memory_order_relaxed) == 0);
atomic_thread_fence(memory_order_acquire);  // 1x acquire fence: acquire-pattern with all preceding atomic loads
auto data = *x;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions