Acquire sequences

Does the following litmus test exhibit undefined behavior according to the C++23 memory consistency model?

```cpp
C lb-lna-faddrel-lrlx-lacq-sna
{ [x] = 0; [y] = 0; }

P0 (int* x, int* y) {
  int a = *y;
  atomic_fetch_add_explicit(x, 1, memory_order_release);
}

P1 (int* x, int* y) {
  int b = atomic_load_explicit(x, memory_order_relaxed);
  int c = atomic_load_explicit(x, memory_order_acquire);
  if (b != 0) {
    *y = 1;
  }
}

~exists(0:a=1)
```

I think it does.
However, `herd` with `cpp11.cat`/`cpp17.cat` reports it as well-defined (no `Undef`).
Adding an extra thread causes `herd` to report `Undef`. 

```cpp
C lb-lna-faddrel-lrlx-lacq-sna-srlx
{ [x] = 0; [y] = 0; }

P0 (int* x, int* y) {
  int a = *y;
  atomic_fetch_add_explicit(x, 1, memory_order_release);
}

P1 (int* x, int* y) {
  int b = atomic_load_explicit(x, memory_order_relaxed);
  int c = atomic_load_explicit(x, memory_order_acquire);
  if (b != 0) {
    *y = 1;
  }
}

P2 (int* y) {
  atomic_store_explicit(x, 0, memory_order_relaxed);
}

~exists(0:a=1)
```

# Pros and Cons of acquire sequences

## What optimizations does lack of acquire fences enable?

Since an acquire-fence may precede any dead atomic load:

```cpp
atomic_thread_fence(memory_order_acquire);
....much later....
atomic_load_explicit(x, memory_order_relaxed); // is there an acquire-fence before this dead-load? can't know in general
```

compiler cannot, in general (i.e. without proving absence of acquire-fences), remove dead atomic loads as part of C++ -> C++ transformations. That is, lack of acquire fences does not, in general, enable compilers to remove dead acquire loads.

## What optimizations do acquire fences enable?

Acquire fences enable C++ -> C++ transformations for common concurrency idioms that result in less total number of acquire operations in a program. For example, consider the common case of polling on a flag to then read some data (MP): 

```c++
while (atomic_load_explicit(flag, memory_order_acquire) == 0);
auto data = *x;
```

Acquire fences make the following transformation sound:

```c++
while (atomic_load_explicit(flag, memory_order_relaxed) == 0);
atomic_load_explicit(flag, memory_order_acquire); // 1x acquire op: acquire-pattern with preceding flag loads only
auto data = *x;
```

This transformation is slightly better than: 

```c++
while (atomic_load_explicit(x, memory_order_relaxed) == 0);
atomic_thread_fence(memory_order_acquire);  // 1x acquire fence: acquire-pattern with all preceding atomic loads
auto data = *x;
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acquire sequences #5

Pros and Cons of acquire sequences

What optimizations does lack of acquire fences enable?

What optimizations do acquire fences enable?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Acquire sequences #5

Description

Pros and Cons of acquire sequences

What optimizations does lack of acquire fences enable?

What optimizations do acquire fences enable?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions