Skip to content

Latest commit

 

History

History
1458 lines (994 loc) · 61.6 KB

File metadata and controls

1458 lines (994 loc) · 61.6 KB

C++ Knowledge Base

A textbook of C++ features that come up in this project. Each chapter introduces a concept with a simple, self-contained example, builds intuition through explanation, then closes with a second example showing how the feature is used in our persistent<T> library.

The chapters are ordered roughly by dependency — later ones assume earlier ones — but each is self-contained enough to be read on its own.


Table of Contents

  1. Class Templates
  2. Function Templates
  3. Variadic Templates and Parameter Packs
  4. Perfect Forwarding
  5. Template Specialization
  6. Type Aliases
  7. Nested Type Members
  8. Special Member Functions
  9. Lambdas and Captures
  10. Placement New
  11. Operator Overloading
  12. inline Variables and Functions
  13. noexcept
  14. GCC/Clang Attributes
  15. Header Guards
  16. Copy and Move Semantics
  17. Object Layout and reinterpret_cast
  18. const Correctness and Const Overloading

Chapter 1: Class Templates

The idea

Suppose you want a class that holds one value, no matter what type. You could write IntBox, DoubleBox, StringBox, but they'd all be the same code with the type swapped. A class template lets you write that code once, parameterized on the type, and the compiler generates the specific versions for you on demand.

Simple example

template <typename T>
class Box {
    T value;
public:
    Box(T v) : value(v) {}
    T get() const { return value; }
};

int main() {
    Box<int> a(42);          // Box for ints
    Box<double> b(3.14);     // Box for doubles
    Box<std::string> c("hi"); // Box for strings
    return a.get();          // 42
}

There is no class named Box at runtime. There are three classes — Box<int>, Box<double>, Box<std::string> — each generated by the compiler the first time you use it. They share source code but not types: you cannot assign Box<int> to Box<double>.

This is sometimes called monomorphization: a single polymorphic template becomes many concrete, monomorphic types. It happens at compile time and costs nothing at runtime (no dynamic dispatch, no vtables, no void *).

Lazy instantiation

The compiler only generates the bodies of template members that are actually called. If Box<int>::get() is called but Box<int>::set() is never called, the compiler may not even check whether set would compile for int. This sounds permissive, but it is also why template errors are often confusing — you discover them at the first call site, not at the template definition.

Where this appears in persistent<T>

PersistentAllocator<T> is a class template just like Box<T>:

template <typename T>
class PersistentAllocator {
public:
    using value_type = T;
    pointer allocate(size_type n) {
        return static_cast<pointer>(pmem_alloc(n * sizeof(T), alignof(T)));
    }
    /* ... */
};

When the STL writes std::vector<int, PersistentAllocator<int>> internally, the compiler generates a concrete class PersistentAllocator<int> with value_type aliased to int, allocate written for int-sized requests, and so on. A second user writing std::list<Node, PersistentAllocator<Node>> triggers a second instantiation PersistentAllocator<Node>. The two share zero code at runtime.


Chapter 2: Function Templates

The idea

A function template is the same idea as a class template but applied to functions. You write one source-level function definition, parameterized on a type, and the compiler generates a separate version for each type the function is called with.

Simple example

template <typename T>
T max(T a, T b) {
    return a > b ? a : b;
}

int main() {
    max(3, 7);          // compiler picks max<int>
    max(2.5, 1.1);      // compiler picks max<double>
}

Notice how you don't have to write max<int>(3, 7) explicitly — the compiler deduces the template argument from the call. This is template argument deduction, and it's what makes function templates feel like ordinary functions.

You can spell out the type if you want to disambiguate: max<double>(3, 7) would force the int 7 to be converted to 7.0.

When deduction fails

Sometimes the compiler can't deduce — for instance, if the same type parameter appears in two arguments with conflicting types:

max(3, 2.5);  // error: T deduced as int from 3, double from 2.5

The fix is either to spell out the type — max<double>(3, 2.5) — or to give each argument its own type parameter.

Where this appears in persistent<T>

Inside PersistentAllocator, the construct member is a function template:

template <typename U, typename... Args>
void construct(U *p, Args&&... args) {
    ::new (static_cast<void*>(p)) U(std::forward<Args>(args)...);
}

U and Args are deduced from the call site. When the STL calls alloc.construct(p, 42), the compiler generates construct<int>(int*, int) for that specific case and emits its body. A different call with construct(p, "hello", 3.14) generates a different body with a different Args....


Chapter 3: Variadic Templates and Parameter Packs

The idea

Some operations naturally accept any number of arguments. Think of printf, or make_shared<T>(arg1, arg2, ...). C++ supports this with variadic templates — templates whose parameter list ends in ..., accepting zero or more arguments.

Simple example

template <typename... Ts>
void print(Ts... args) {
    ((std::cout << args << ' '), ...);   // C++17 fold expression
}

int main() {
    print(1, 2.5, "three");   // prints: 1 2.5 three 
}

typename... Ts declares a template parameter pack — zero or more type parameters. Ts... args is a function parameter pack — the corresponding values. Inside the function, the syntax args... (with the dots after the name) is pack expansion — the compiler unfolds the pack into a comma-separated list.

For print(1, 2.5, "three"), Ts is {int, double, const char*} and args is the three values. The fold expression (expr, ...) repeats expr for each element of the pack, joined with the comma operator.

Why this matters

Without variadic templates, you'd have to write separate overloads for every arity: print(a), print(a, b), print(a, b, c), etc. Variadic templates let you write one definition that handles them all, with full type information for each argument.

Where this appears in persistent<T>

The construct method of PersistentAllocator accepts any number of constructor arguments:

template <typename U, typename... Args>
void construct(U *p, Args&&... args) {
    ::new (static_cast<void*>(p)) U(std::forward<Args>(args)...);
}

The pack expansion std::forward<Args>(args)... produces a comma-separated list of forwarded arguments, which the compiler feeds into the constructor of U. If the STL calls construct(p, 3, 4) for U = Point, the expansion becomes Point(3, 4). If it calls construct(p) for a default-constructed Point, the expansion is empty and the call becomes Point().

The class specialization of persistent<T> uses the same idiom:

template <typename... Args>
explicit persistent(Args&&... args) : T(std::forward<Args>(args)...) {}

So persistent<Point>(3, 4) constructs the inherited Point(3, 4), regardless of how many constructor args Point takes.


Chapter 4: Perfect Forwarding

The idea

When you write a function that passes its arguments to another function — a wrapper, a thunk, a factory — you want the wrapped call to behave exactly as if the original caller had called it directly. If the caller passed a temporary (rvalue), the inner call should move from it. If the caller passed a named variable (lvalue), the inner call should copy or take a reference. Perfect forwarding is the C++ idiom for preserving this distinction through a layer of indirection.

Simple example

Consider a logger that wraps any function call:

template <typename F, typename... Args>
auto logged_call(F f, Args&&... args) {
    std::cout << "calling...\n";
    return f(std::forward<Args>(args)...);
}

void process(std::string s) { /* ... */ }

std::string s = "hello";
logged_call(process, s);            // should COPY s
logged_call(process, std::move(s)); // should MOVE s

Without perfect forwarding, both calls inside logged_call would be copies, because once args is sitting inside the function as a named variable, it is an lvalue regardless of what was passed in. You'd lose the move.

How it works — three pieces

Three pieces must appear together for perfect forwarding to work:

  1. A template parameter for the argument type: typename T (or typename... Args).
  2. A forwarding reference in the parameter list: T&& (only a forwarding reference when T is itself a deduced template parameter — otherwise it's an rvalue reference).
  3. A std::forward call inside the function body that re-casts the argument to its original value category: std::forward<T>(arg).

A forwarding reference (T&& in a deduced context) is special: it binds to both lvalues and rvalues, and the deduced T records which. Passed an lvalue s, T becomes std::string&. Passed an rvalue std::move(s), T becomes std::string (no reference). std::forward<T> reads T and re-casts accordingly.

Forget any one piece and the forwarding silently degrades to a copy.

Where this appears in persistent<T>

The class specialization needs to forward its constructor arguments to the inherited base class:

template <typename... Args>
explicit persistent(Args&&... args) : T(std::forward<Args>(args)...) {}

If a caller writes persistent<Stack>(std::move(some_stack)), this preserves the rvalue-ness and triggers Stack's move constructor. If a caller writes persistent<Stack>(other_stack), the lvalue is preserved and Stack's copy constructor runs. Without std::forward, both would copy — and for large objects that's a silent performance regression.


Chapter 5: Template Specialization

The idea

Sometimes a generic template isn't enough — you want the same name to behave differently for specific types. Template specialization lets you provide alternative definitions for particular template arguments. The compiler picks the most specific match.

Simple example — full specialization

template <typename T>
struct TypeName { static const char* get() { return "unknown"; } };

template <>
struct TypeName<int> { static const char* get() { return "int"; } };

template <>
struct TypeName<double> { static const char* get() { return "double"; } };

TypeName<int>::get();    // "int"
TypeName<double>::get(); // "double"
TypeName<char>::get();   // "unknown" (falls through to primary)

Partial specialization

You can also specialize on a pattern rather than a specific type:

template <typename T>
struct IsPointer { static const bool value = false; };

template <typename T>
struct IsPointer<T*> { static const bool value = true; };  // partial

IsPointer<int>::value;     // false
IsPointer<int*>::value;    // true — matches T*
IsPointer<double*>::value; // true — matches T*

The compiler picks the most specific match — if a type fits both the primary template and a partial specialization, the partial wins.

SFINAE — "Substitution Failure Is Not An Error"

A subtler dispatch mechanism: if substituting a template argument produces an ill-formed type, that template is silently removed from consideration rather than producing a compile error. This is the mechanism behind std::enable_if:

template <typename T,
          typename = std::enable_if_t<std::is_integral_v<T>>>
void only_integers(T x) { /* ... */ }

only_integers(42);      // fine: int is integral
only_integers(3.14);    // error: no matching function (silently removed)

When T = double, std::is_integral_v<double> is false, std::enable_if_t<false> is ill-formed, and this candidate is dropped. The compiler reports "no matching function" instead of a confusing error inside the body.

Where this appears in persistent<T>

The whole reason persistent<T> has two implementations is partial specialization with SFINAE. The forward declaration has a SFINAE slot:

template <typename T,
          template <typename> class Alloc = PersistentAllocator,
          typename E = void>
class persistent;       // forward declaration, no body

The two specializations fill the E slot with std::enable_if<condition>::type:

// Primitive specialization — only viable when T is fundamental or pointer
template <typename T, template <typename> class Alloc>
class persistent<T, Alloc,
                 typename std::enable_if<std::is_fundamental<T>::value
                                      || std::is_pointer<T>::value>::type> {
    T contents;
    /* ... */
};

// Class specialization — only viable when T is neither
template <typename T, template <typename> class Alloc>
class persistent<T, Alloc,
                 typename std::enable_if<!(std::is_fundamental<T>::value
                                        || std::is_pointer<T>::value)>::type>
    : public T {
    /* ... */
};

For persistent<int>, the primitive specialization's enable_if<true>::type is void, matching the default E = void. The class specialization's enable_if<false>::type is ill-formed and silently removed. The primitive wins.

For persistent<Stack>, the reverse happens: the class specialization is viable, the primitive is not, and persistent<Stack> inherits from Stack.

The user never sees this machinery — they just write persistent<int> or persistent<Stack> and get the appropriate behavior. The compiler does the routing.


Chapter 6: Type Aliases

The idea

C++ lets you give a new name to an existing type. There are two syntaxes — the old C-style typedef and the modern using — but using reads more naturally and is more powerful.

Simple example

using IntVec = std::vector<int>;
using StringMap = std::map<std::string, std::string>;

IntVec numbers = {1, 2, 3};
StringMap headers;

Compare with typedef:

typedef std::vector<int> IntVec;
typedef std::map<std::string, std::string> StringMap;

The using form reads left-to-right: "the name IntVec is an alias for std::vector<int>." The typedef form requires reading right-to-left, which fights with how variable declarations work.

The killer feature — template aliases

using supports templates; typedef does not:

template <typename T>
using Vec = std::vector<T, MyAllocator<T>>;

Vec<int> v;                  // expands to std::vector<int, MyAllocator<int>>
Vec<std::string> names;

This is impossible with typedef. If you ever need a parameterized type alias, you need using.

Where this appears in persistent<T>

Every STL-required type alias in PersistentAllocator uses using:

template <typename T>
class PersistentAllocator {
public:
    using value_type = T;
    using pointer = T*;
    using const_pointer = const T*;
    using reference = T&;
    using const_reference = const T&;
    using size_type = std::size_t;
    using difference_type = std::ptrdiff_t;
    /* ... */
};

These aren't optional documentation — STL containers query them by name. When std::vector needs to know the element type its allocator manages, it asks Alloc::value_type. When it needs to know the pointer type, it asks Alloc::pointer. The names are part of the contract.


Chapter 7: Nested Type Members

The idea

A class can declare types inside its scope — both type aliases and inner classes. You refer to them with OuterClass::InnerName.

Simple example

class Container {
public:
    using value_type = int;
    
    struct Iterator {
        value_type *ptr;
        value_type& operator*() { return *ptr; }
    };
    
    Iterator begin();
    Iterator end();
};

Container::value_type x = 5;     // int
Container::Iterator it = c.begin();

Container::Iterator is a type inside Container. It's qualified by Container:: when used outside the class, like a namespace.

The typename keyword inside templates

When you have a template parameter and you want to access a nested type through it, you need typename to tell the compiler you mean a type, not a value:

template <typename C>
void f() {
    typename C::value_type x;   // x is of type C's value_type
    // Without typename:
    // C::value_type x;         // parses as a multiplication!
}

Why? Because before the template is instantiated, the compiler doesn't know what C is. C::value_type might be a type (a typedef) or a static member (a value). The compiler assumes it's a value by default, so C::value_type x parses as "multiply C::value_type by x". typename overrides that assumption.

Template nested templates

You can have a template inside a template:

template <typename T>
class Allocator {
public:
    template <typename U>
    struct rebind {
        using other = Allocator<U>;
    };
};

using NewAlloc = Allocator<int>::rebind<double>::other;
// equivalent to Allocator<double>

Step by step: Allocator<int> gives you the outer template instantiation. ::rebind<double> invokes the inner template with U = double. ::other accesses the type alias defined inside rebind. The whole thing evaluates to Allocator<double>.

Where this appears in persistent<T>

The STL allocator concept requires the rebind mechanism. Containers that need an allocator for a different type than the user supplied — std::list<int> needs an allocator for list-node structs, not for int — use rebind to derive it:

template <typename T>
class PersistentAllocator {
public:
    template <typename U>
    struct rebind {
        using other = PersistentAllocator<U>;
    };
    /* ... */
};

Inside our operator delete:

alloc.deallocate(static_cast<typename allocator_type::pointer>(ptr), 1);

The typename is required because allocator_type (which is Alloc<T>) is a dependent type — depends on the template parameter — and pointer is a nested name inside it. Without typename, the cast wouldn't parse.


Chapter 8: Special Member Functions

The idea

Every C++ class has six functions the compiler can generate automatically: a default constructor, copy constructor, copy assignment, move constructor, move assignment, and destructor. Together they govern how objects come into existence, how they're copied or moved between containers, and how they clean up.

Simple example

struct Plain {
    int x;
};

Plain a;            // default constructor: x is uninitialized garbage
Plain b = a;        // copy constructor: b.x = a.x
b = a;              // copy assignment: same effect
Plain c = std::move(a);  // move constructor: for trivially copyable types, same as copy
// destructor: implicit, does nothing for Plain

You didn't write any of these. The compiler generated them.

When does the compiler generate them?

The rules are subtle:

  • Default constructor: generated only if you declare no constructors of any kind.
  • Copy constructor / copy assignment: generated by default unless you =delete them.
  • Move constructor / move assignment: generated only if you haven't declared any of {destructor, copy ctor, copy assignment, move ctor, move assignment}. Declaring any of these suppresses automatic move generation.
  • Destructor: always generated unless =deleted.

This leads to the rule of zero, three, or five:

  • Rule of zero: if you don't manage any resources manually (just have members that manage themselves), don't write any of the six. The compiler-generated versions are correct.
  • Rule of three (pre-C++11): if you write one of {destructor, copy ctor, copy assignment}, you usually need all three — they're entangled.
  • Rule of five (C++11+): same idea, but include move ctor and move assignment.

Converting constructors

A non-special, but related, idea: a templated constructor that converts from one type to another.

template <typename T>
class Box {
public:
    Box() {}
    Box(const Box&) {}                    // copy: same T
    
    template <typename U>
    Box(const Box<U>&) {}                  // converting: different T
};

Box<int> a;
Box<double> b(a);    // calls converting constructor, U = int

The converting constructor is not a copy constructor — Box<int> and Box<double> are different types. It's a template that lets you build one from another.

Where this appears in persistent<T>

PersistentAllocator manually writes its special members and adds a converting constructor:

template <typename T>
class PersistentAllocator {
public:
    PersistentAllocator() noexcept {}                                  // default
    PersistentAllocator(const PersistentAllocator&) noexcept {}        // copy
    
    template <typename U>
    PersistentAllocator(const PersistentAllocator<U>&) noexcept {}     // converting
    /* ... */
};

The converting constructor is required for STL allocator compatibility. Containers use rebind to compute a new allocator type and then construct it from the original allocator via this converting constructor. Without it, std::list<int, PersistentAllocator<int>> fails to compile — the list can't build its PersistentAllocator<ListNode<int>> from its PersistentAllocator<int>.

All four are noexcept because the STL spec requires allocator special members not to throw.


Chapter 9: Lambdas and Captures

The idea

A lambda is an anonymous function — defined inline at the point of use. Under the hood, the compiler generates a class with operator() and an instance of that class. Lambdas can also capture variables from the surrounding scope, turning them into "closures."

Simple example

int x = 5;
auto add_x = [x](int n) { return n + x; };   // captures x by value
std::cout << add_x(3);  // 8

std::vector<int> v = {3, 1, 4, 1, 5, 9, 2, 6};
std::sort(v.begin(), v.end(), [](int a, int b) { return a > b; }); 
// sorts descending — lambda is the comparator

The square brackets [...] are the capture list. The parentheses (int n) are the parameter list, like a regular function. The braces contain the body.

Capture modes

Syntax Meaning
[] No captures. Body cannot reference enclosing variables.
[x] Capture x by value (copy stored in the lambda).
[&x] Capture x by reference.
[=] Capture all used outer variables by value.
[&] Capture all used outer variables by reference.
[this] Capture the enclosing object's this pointer (for member-function lambdas).
[x, &y] Mix: x by value, y by reference.

The "default" captures [=] and [&] capture only variables that are actually referenced by the body. Unused variables are not captured.

The lifetime trap

Reference captures ([&x], [&]) hold pointers — if the captured variable goes out of scope while the lambda is still alive, you have a dangling reference. This is the most common lambda bug:

std::function<int()> make_bad() {
    int x = 5;
    return [&] { return x; };  // captures &x
}  // x destroyed here; returned lambda is now broken

For short-lived lambdas (used immediately, like a sort comparator), [&] is safe and convenient. For long-lived lambdas (stored, returned, used asynchronously), prefer [=] or explicit captures so the lambda owns its state.

Where this appears in persistent<T>

The PMDK C++ bindings use lambdas for the transaction body:

pop = pool<root>::open(POOL_PATH, "ex1");
auto r = pop.root();

transaction::run(pop, [&] {           // [&] captures r and pop
    r->counter = r->counter + 1;
});

transaction::run is pmem::obj::transaction::run. It takes a pool reference and a callable (the lambda). It opens a transaction, calls the lambda, and either commits on normal return or rolls back if the lambda throws or abort()s.

[&] is the right choice here: the transaction body is short-lived (runs entirely inside transaction::run), and we need to modify r->counter in the surrounding scope. Capturing by value ([=]) would make a copy of r, and writes to r->counter would target the copy instead of the actual persistent object.


Chapter 10: Placement New

The idea

The expression new T(args) normally does two things: allocate raw memory from the heap, then construct a T in that memory. Placement new separates these — you supply pre-allocated memory, and new just constructs into it without doing any allocation.

Simple example

#include <new>

char buffer[sizeof(int)];                     // 4 raw bytes on the stack
int *p = new (buffer) int(42);                // construct int(42) in buffer
std::cout << *p;                              // 42

p->~int();   // (no-op for int, but conceptually: destroy the object)

Syntactically:

  • new (buffer) — the parenthesized argument after new is the placement argument: tells new to use this address.
  • int(42) — the type to construct and the constructor arguments.

No allocation happens. The placement form of operator new is built-in and trivial: it just returns the address you gave it.

Why this is useful

Three reasons:

  1. Reusing memory from a different allocator. If you allocated bytes via some custom allocator (a memory pool, an mmap'd region, persistent memory), placement new lets you construct objects in that memory without going through malloc.
  2. Pre-allocating capacity. std::vector allocates space for many elements at once but only constructs elements as they're inserted. The rest is raw, awaiting placement new.
  3. In-place reconstruction. Sometimes you need to destroy and recreate an object at the same address, without freeing the memory. Placement new + explicit destructor call does this.

Pairing with destruction

new T(args) allocates and constructs. Its inverse, delete p, destructs and deallocates. Placement new only constructs, so you must call the destructor explicitly and deallocate separately:

T *p = ::new (raw_memory) T(args);
// ... use p ...
p->~T();              // run the destructor
my_dealloc(raw_memory);  // release the memory

For trivially destructible types like int, the explicit destructor call is a no-op, but writing it makes the intent clear.

Where this appears in persistent<T>

The construct method of PersistentAllocator is placement new in disguise:

template <typename U, typename... Args>
void construct(U *p, Args&&... args) {
    ::new (static_cast<void*>(p)) U(std::forward<Args>(args)...);
}

Step by step:

  • ::new — the leading :: forces global placement new, bypassing any class-overloaded operator new. Critical: if U defined operator new, plain new would call it and allocate again. We don't want that — we want to construct in the memory the allocator already provided.
  • (static_cast<void*>(p)) — the placement argument: use this address.
  • U(std::forward<Args>(args)...) — the constructor, with perfectly-forwarded arguments.

This is exactly how STL containers initialize elements. vector::push_back(x) calls allocator.construct(slot, x), which placement-news T(x) into the next free slot. The vector's underlying buffer holds raw bytes until each slot is constructed.


Chapter 11: Operator Overloading

The idea

Built-in operators like +, ==, << are defined by the language for built-in types. For user-defined types, you can give them custom meanings by overloading the operators as functions.

Simple example

struct Vec2 {
    double x, y;
    
    Vec2 operator+(const Vec2& other) const {     // member function
        return {x + other.x, y + other.y};
    }
};

bool operator==(const Vec2& a, const Vec2& b) {   // free function
    return a.x == b.x && a.y == b.y;
}

Vec2 a{1, 2}, b{3, 4};
Vec2 c = a + b;       // calls Vec2::operator+
bool same = a == b;   // calls free operator==

Member vs free functions

Both forms work for most operators. Conventions:

  • Symmetric / binary: prefer free functions. a + b and b + a should look the same. As a free function, neither argument is special.
  • Asymmetric / mutating: prefer member functions. a += b modifies a — the left operand is the object being acted on.
  • Conversion operators (operator T()): must be member functions.

Common operators to overload

Operator Typical use
+, -, *, / Arithmetic. Usually free functions, return by value.
==, !=, <, > Comparison. Free functions, return bool.
= Assignment. Member function. Returns *this by reference.
[] Subscript. Member function. Returns reference.
() Function call. Member function — turns the class into a callable.
->, * Pointer dereference. Member function. For smart pointers.
operator T() Implicit conversion to T. Member function.

Where this appears in persistent<T>

Several places. The simplest: allocator equality is a free function template:

template <typename T1, typename T2>
bool operator==(const PersistentAllocator<T1>&, const PersistentAllocator<T2>&) noexcept {
    return true;
}

template <typename T1, typename T2>
bool operator!=(const PersistentAllocator<T1>&, const PersistentAllocator<T2>&) noexcept {
    return false;
}

All PersistentAllocator instances compare equal regardless of value type, because they all draw from the same global pool.

The primitive specialization of persistent<T> uses several member operators:

inline operator T&() { return contents; }              // conversion to T&
inline T operator->() { ... return load(); }           // arrow operator
persistent& operator=(const T& data) { ... }           // assignment from T
  • operator T&() lets persistent<int> decay to int& automatically. So persistent<int> p; int x = p; works without explicit conversion — the compiler inserts a call to operator int&().
  • operator->() enables pointer_persistent->member for pointer-typed T. The static assert inside ensures only pointer instantiations compile.
  • operator=(const T&) lets p = 5 work — assigning a raw T value to the wrapper.

Together these let persistent<int> masquerade as a real int in most contexts.


Chapter 12: inline Variables and Functions

The idea

C++ has a rule called the One Definition Rule (ODR): every function or variable must have exactly one definition in the entire program. Headers complicate this — if a header defines a function and is included from multiple .cpp files, each .cpp ends up with its own copy of the definition, and the linker refuses to combine them.

inline is the answer: it tells the linker "expect multiple definitions of this; they're all the same; merge them into one."

Simple example — the broken version

// math.h
int square(int x) { return x * x; }    // BAD: definition in header
// main.cpp
#include "math.h"     // gets square's body
// other.cpp
#include "math.h"     // also gets square's body

Both .cpp files compile fine — each ends up with square in its object file. The linker sees two definitions and reports: "multiple definition of square."

The fix

// math.h
inline int square(int x) { return x * x; }

Now the linker tolerates the duplication and picks one definition.

Historical note

inline originally meant "please inline this function" — a hint to the optimizer. That meaning is mostly gone; modern compilers inline based on their own analysis, ignoring the keyword as a hint. The remaining meaning is just the ODR-relaxation above.

C++17 inline variables

Until C++17, you couldn't define a non-const variable in a header without it causing the same multiple-definition problem. Workarounds were ugly: extern declaration in the header plus a definition in some .cpp file, or using static (which makes a separate copy per .cpp — usually wrong).

C++17 added inline variables, which work the same way inline functions do:

// counter.h
inline int counter = 0;    // one shared counter across all TUs

Where this appears in persistent<T>

pmem_allocator.hpp is a header that defines functions and a global variable. Without inline, including it from multiple .cpp files would fail to link.

inline PMEMobjpool *global_pool = nullptr;   // C++17 inline variable

__attribute__((constructor))
inline void pmem_alloc_init() { /* ... */ }

inline void *pmem_alloc(size_t size, size_t align) { /* ... */ }
inline void pmem_free(void *ptr) { /* ... */ }

Member functions defined inside a class body are implicitly inline, which is why you don't see inline on PersistentAllocator::allocate, etc. — they're already covered.


Chapter 13: noexcept

The idea

noexcept declares that a function does not throw exceptions. It's both documentation (callers know they don't need to plan for exception cleanup at this call site) and an optimization signal (the compiler and STL can take advantage).

Simple example

int add(int a, int b) noexcept { return a + b; }

void use() {
    int x = add(3, 4);   // no exception cleanup code generated around this call
}

If a noexcept function does throw, std::terminate is called immediately. There's no recovery. noexcept is a hard contract, not advisory.

Why STL containers care

std::vector is the classic example. When a vector outgrows its capacity, it allocates a bigger buffer and has to move elements from the old buffer to the new. If a move could throw mid-transfer, the vector would be left in a half-moved, inconsistent state — some elements moved, some not, no clean way to recover.

So std::vector only uses move if the move constructor is noexcept. Otherwise it falls back to copy — slower, but exception-safe (a thrown copy doesn't damage the originals).

Result: a class with a non-noexcept move constructor silently disables the move optimization in every standard container. The class might still be correct, but every reallocation copies instead of moves. Adding noexcept to a class's move members is one of the highest-leverage one-character changes you can make.

When to add noexcept

Add it when you can honestly promise no exceptions:

  • Trivial getters and setters
  • Move constructors / move assignments of types with no exception-throwing members
  • Functions that only call other noexcept functions and built-in operations
  • Destructors (already implicitly noexcept by default — don't remove that)

Don't add it speculatively to functions that might throw. A wrong noexcept is worse than no noexcept because the cleanup path is std::terminate.

Where this appears in persistent<T>

The STL allocator concept requires certain special members to be noexcept:

PersistentAllocator() noexcept {}
PersistentAllocator(const PersistentAllocator&) noexcept {}

template <typename U>
PersistentAllocator(const PersistentAllocator<U>&) noexcept {}

void deallocate(pointer p, size_type n) noexcept { pmem_free(p); }

If any of these threw, STL containers would corrupt themselves trying to recover. deallocate is noexcept because freeing memory must succeed — if pmem_free somehow failed, we'd terminate rather than leave the container in a broken state.

allocate is not noexcept — it can legitimately throw std::bad_alloc if the pool is out of space. The STL knows this and handles it.


Chapter 14: GCC/Clang Attributes

The idea

Compiler-specific extensions for attaching metadata to code: hints to the optimizer, instructions for the linker, behavior modifiers. They're not standard C++, but GCC and Clang both support them, and they're widespread enough to be treated as portable across those compilers.

Simple example

__attribute__((unused))
int debug_value = 0;                 // silences "unused variable" warnings

void critical() __attribute__((always_inline));   // force inlining

void deprecated_function() __attribute__((deprecated("use new_function instead")));

The syntax __attribute__((name)) or __attribute__((name(args))) attaches the named attribute. The compiler reads it and adjusts behavior accordingly.

Attributes we use

constructor and destructor

__attribute__((constructor))
void init_things() { /* runs automatically before main() */ }

__attribute__((destructor))
void cleanup_things() { /* runs automatically after main() returns */ }

These mark functions to run at program load/unload. The C runtime calls all constructor functions before main and all destructor functions after main (or after exit). Order between multiple constructors from different translation units is unspecified — don't depend on it.

Used for: setting up global state that must exist before main starts (memory pools, logging systems, runtime libraries).

always_inline

inline int load() __attribute__((always_inline)) { return contents; }

Stronger than inline (which is a hint). always_inline insists: refuse to compile rather than fall back to a regular call. Use for performance-critical tiny functions where the call overhead matters.

C++23 attribute syntax

C++ has its own attribute syntax [[attribute]] that's growing over time. Some GCC attributes have standard equivalents: [[noreturn]], [[nodiscard]], [[deprecated]]. For the ones we use (constructor, destructor), the GCC syntax is still standard practice.

Where this appears in persistent<T>

pmem_allocator.hpp uses constructor and destructor to manage the global pool lifecycle:

__attribute__((constructor))
inline void pmem_alloc_init() {
    // open or create the PMDK pool
    global_pool = pmemobj_open(get_pool_path(), "pmem_pool");
    if (!global_pool)
        global_pool = pmemobj_create(get_pool_path(), "pmem_pool", PMEM_POOL_SIZE, 0666);
}

__attribute__((destructor))
inline void pmem_alloc_fini() {
    if (global_pool) pmemobj_close(global_pool);
}

The user never calls these. Any program that includes the header gets a pool opened before main runs and closed after main exits. The mechanism mirrors how numa-lib opens its UMF jemalloc pools — same pattern, different backend.

persistenttype.hpp uses always_inline on load and store:

inline T load() __attribute__((always_inline)) { return contents; }
inline void store(T data) __attribute__((always_inline)) { contents = data; }

These are trivial wrappers — letting the compiler not-inline them would add a function call to every read and write of a persistent value. always_inline makes them disappear at the call site, reducing the wrapper to zero cost.


Chapter 15: Header Guards

The idea

If a header is included multiple times in the same translation unit — directly, or indirectly through some other header — its contents are processed multiple times. Type definitions get redeclared, leading to compile errors. Header guards prevent this by making subsequent inclusions skip the body.

Simple example — the problem

// foo.h
struct Foo { int x; };
// bar.h
#include "foo.h"     // pulls in Foo
struct Bar { Foo f; };
// main.cpp
#include "foo.h"     // Foo defined here
#include "bar.h"     // bar.h also includes foo.h — Foo defined AGAIN — error

The compiler reports: "redefinition of struct Foo".

The classic fix — macro guards

// foo.h
#ifndef FOO_H
#define FOO_H

struct Foo { int x; };

#endif

First inclusion: FOO_H isn't defined, so the body is processed and FOO_H becomes defined. Second inclusion: FOO_H is already defined, so the body is skipped entirely.

The macro name must be unique — if two unrelated headers both used FOO_H as their guard, including one would mask the other. Conventional naming: path-like, all caps, with separators replaced by underscores. _PMEM_ALLOCATOR_HPP_, MY_PROJECT_FOO_BAR_H, etc.

The modern fix — #pragma once

// foo.h
#pragma once

struct Foo { int x; };

#pragma once is non-standard but supported by every major compiler. It tells the compiler "include this file at most once per translation unit." Faster than the macro guard because the compiler can short-circuit on the second include without reparsing or re-tokenizing.

Belt and suspenders

Many headers use both:

#pragma once
#ifndef _MY_HEADER_H_
#define _MY_HEADER_H_

// ...

#endif

Redundant but harmless. The macro guard is universally standard; the pragma is faster on the compilers that support it.

Where this appears in persistent<T>

Both library headers use both:

// pmem_allocator.hpp
#pragma once
#ifndef _PMEM_ALLOCATOR_HPP_
#define _PMEM_ALLOCATOR_HPP_
// ...
#endif

// persistenttype.hpp  
#pragma once
#ifndef PERSISTENTTYPE_HPP
#define PERSISTENTTYPE_HPP
// ...
#endif

Without these, including persistenttype.hpp from two .cpp files would attempt to redefine PersistentAllocator, the global pmem_alloc_init, etc. — link errors at best, redefinition errors at worst. The first lesson of header writing is to never skip the guard.


Chapter 16: Copy and Move Semantics

The idea

There are two ways to build a new object from an existing one. Copying duplicates the source — both source and destination remain fully valid afterward, each with its own resources. Moving transfers ownership of the source's resources to the destination, leaving the source in a valid but emptied "husk" state. Moving exists because copying is often expensive (allocating fresh memory, deep-copying contents), and the source is sometimes about to be destroyed anyway — in which case stealing its resources is essentially free.

Chapter 8 introduced the six special member functions. This chapter goes deeper on the two pairs that matter most: copy ctor + copy assignment, and move ctor + move assignment.

Simple example

A class that owns a heap allocation makes the difference visible:

class StringBox {
    char* data;
    size_t len;
public:
    StringBox(const char* s) {
        len = std::strlen(s);
        data = new char[len + 1];
        std::memcpy(data, s, len + 1);
    }

    // Copy: allocate fresh memory, duplicate contents.
    StringBox(const StringBox& other)
        : len(other.len), data(new char[other.len + 1]) {
        std::memcpy(data, other.data, len + 1);
    }

    // Move: steal the pointer, null out source.
    StringBox(StringBox&& other) noexcept
        : data(other.data), len(other.len) {
        other.data = nullptr;
        other.len = 0;
    }

    ~StringBox() { delete[] data; }
};

Copy cost: O(n) — one allocation plus a memcpy. Move cost: O(1) — three pointer assignments, no allocation.

When you have a temporary that's about to die, moving is essentially free. When you need both source and destination to coexist as independent owners, you must copy.

lvalues, rvalues — how the compiler picks copy vs move

The compiler chooses copy or move by looking at the value category of the source expression. There's no runtime decision — it's pure compile-time overload resolution.

  • lvalue: has a name, persists past its current expression. Variable references, member accesses, the result of *ptr. You can "still use it" after this expression.
  • rvalue: temporary, no enduring identity, about to die. Function return-by-value, literals like 42, the result of std::move(x). The compiler knows nobody else will see it again.
StringBox a("hello");
StringBox b = a;              // a is lvalue → copy ctor chosen
StringBox c = make_box();     // make_box() returns rvalue → move ctor chosen
StringBox d = std::move(a);   // std::move(a) is rvalue → move ctor chosen

After std::move(a), the variable a still exists as a name — but its contents have been moved out. It's now in a "moved-from" state: valid (destroying it is fine, assigning to it is fine), but its value is unspecified. Don't read from it expecting a particular value.

T&& — rvalue references

The mechanism that makes overload resolution pick the move ctor is the parameter type. A T&& (two ampersands) is an rvalue reference — it binds only to rvalues, never to lvalues.

void f(const StringBox&);   // "copy-friendly" overload: accepts anything
void f(StringBox&&);         // "move-friendly" overload: accepts only rvalues

StringBox a("x");
f(a);              // f(const StringBox&) chosen — a is lvalue
f(std::move(a));   // f(StringBox&&) chosen — std::move(a) is rvalue
f(StringBox("y")); // f(StringBox&&) chosen — temporary is rvalue

Both overloads exist; the compiler picks based on value category. The copy ctor's const T& parameter binds to both lvalues and rvalues, but when a move ctor is also present with T&&, overload resolution prefers the move version for rvalues — it's a closer match.

std::move is a cast, not a move

A common confusion: std::move(x) does not perform a move. It is a function template that simply casts its argument to an rvalue reference. The actual move work happens later, because the cast steers the call into the move-ctor / move-assignment overload instead of the copy version.

Effectively:

template <typename T>
constexpr T&& move(T& x) noexcept {
    return static_cast<T&&>(x);
}

So auto y = std::move(x); is two steps: the cast (free), then the move ctor running (the work).

Why noexcept matters for moves

The standard library treats noexcept move operations as a contract. The clearest case is std::vector: when it needs to reallocate (grow its buffer), it has to relocate every element. If the elements' move constructors can throw, the vector can't guarantee strong exception safety mid-reallocation — a thrown exception halfway through leaves the new and old buffers both holding live references, and you can't tell which to destroy. So std::vector falls back to copy (not move) during reallocation unless the element's move ctor is marked noexcept.

Practical consequence: if you write a move ctor or move assignment, mark it noexcept — otherwise you silently lose its benefit in the most common use case.

When the compiler synthesizes the moves (and when it refuses)

This is the rule that bites people:

The implicit move ctor and move assignment are generated only if the class declares none of: destructor, copy ctor, copy assignment, move ctor, move assignment.

Declaring any one of these — even with = default — suppresses automatic generation of the moves. The original idea: if you've gone to the trouble of writing one resource-management hook, you probably need to think about all five. The compiler refuses to guess.

This produces an awkward situation: a class that needs a custom destructor (because it owns memory, a file handle, a lock) is exactly the kind of class where moves matter most — but it doesn't get them implicitly. You have to declare them explicitly.

Hence the rule of five: if you declare one of {dtor, copy ctor, copy assignment, move ctor, move assignment}, you usually need to declare (or explicitly delete or = default) all five. Declaring none — and letting the compiler synthesize everything — is the rule of zero, the preferred path when your data members already manage themselves (e.g., std::string, std::vector).

Where this appears in pmem_ptr<T>

pmem_ptr<T> is a small wrapper around a PMEMoid — a 16-byte POD struct of {pool_uuid_lo, offset}. There's no resource for the move ctor to "steal" that the copy ctor wouldn't also handle in the same time — both are bitwise copies of 16 bytes. So we default both:

template <typename T>
class pmem_ptr {
    PMEMoid oid_;
public:
    pmem_ptr() : oid_(OID_NULL) {}
    pmem_ptr(const pmem_ptr&) = default;     // copy ctor: 16-byte memcpy
    pmem_ptr(pmem_ptr&&)      = default;     // move ctor: same as copy for POD

    pmem_ptr& operator=(const pmem_ptr&);    // user-defined: snapshots before write
    // move assignment: NOT declared
};

The interesting part is operator=. We user-define it because every write to a pmem_ptr's OID slot needs to snapshot into the active transaction's undo log first. The moment we do this, the implicit move assignment is not generated (by the rule above) — b = std::move(a) falls back to our copy assignment.

For pmem_ptr this is fine: the copy path does exactly what we want (snapshot, then assign 16 bytes), and there's no separate "move" optimization to lose. If we wanted a distinct move assignment, we'd write it explicitly:

pmem_ptr& operator=(pmem_ptr&& other) noexcept {
    snapshot_if_pmem();
    oid_ = other.oid_;
    other.oid_ = OID_NULL;
    return *this;
}

But this gives no benefit here — there's no allocation to skip, no pointer to steal. We just inherit the copy path.

The broader lesson: every time you write one of the five special members, pause and ask: have I just silently suppressed the others? If so, do I want them back via = default, or is suppression deliberate?


Chapter 17: Object Layout and reinterpret_cast

The idea

An object's layout is the precise byte-by-byte arrangement of its fields in memory. The compiler decides this layout based on field declarations, alignment, padding, and inheritance. Most of the time you don't care — obj.x = 5 and &obj are sufficient. But when two distinct types happen to share the same layout, you can reinterpret one as the other and access the same bytes through a different type. This trick is called reinterpret_cast, and it underlies several real-world patterns including the numa paper's type-coercion approach.

It also has well-defined failure modes — what looks like a free type conversion can silently read the wrong bytes if the layouts don't match.

Simple example

Two types with identical fields:

struct PointA {
    int x;
    int y;
};

struct PointB {
    int a;
    int b;
};

PointA p{3, 4};
PointB* q = reinterpret_cast<PointB*>(&p);
// q->a reads p.x at offset 0 → 3
// q->b reads p.y at offset 4 → 4

This works because both types are standard-layout and have the same sequence of field types. The compiler places x at offset 0 and y at offset 4 in PointA; the same for a and b in PointB. The bytes are interchangeable.

Now a mismatch:

struct Tiny {
    int    x;
};

struct Big {
    int    x;
    double extra;
};

Tiny t{42};
Big* b = reinterpret_cast<Big*>(&t);
b->x;       // 42 — okay, same offset
b->extra;   // GARBAGE — reads 8 bytes past the end of t

Big::extra lives at offset 4 (or maybe 8 after padding), in memory that Tiny never allocated. The compiler accepted the cast; the runtime returns whatever's adjacent in memory. There's no type check. reinterpret_cast tells the compiler "trust me, this is fine"; if you're wrong, you get a silent corruption bug.

Standard layout — when is a cast safe?

The C++ standard defines a category called standard-layout type. Standard-layout types have predictable enough memory layout that you can reason about offsets and reinterpret_cast between layout-compatible ones. The full rule list is long, but the practically important ones:

  • All non-static members are in the same access section (e.g. all public).
  • No virtual functions, no virtual bases.
  • All base classes (if any) are also standard-layout.
  • The first non-static member isn't a base class.
  • No reference members.

A POD-like struct without inheritance and without virtual is standard-layout. Two standard-layout types with the same sequence of member types in the same order have layout-compatible layouts. reinterpret_cast between pointers to them, followed by accessing the bytes, is well-defined (well, mostly — strict aliasing rules add caveats, but for "looks like the same fields in the same order" cases, it works in practice).

Vtable wrinkle

A class with virtual functions is not standard-layout. The first 8 bytes (on a 64-bit system) are typically the vtable pointer, not the first declared field:

struct Base {
    virtual void f() {}
    int x;
};

Base b;
b.x;        // not at offset 0! offset 8 (after vtable pointer)

This means you cannot naively reinterpret_cast between a virtual class and a non-virtual one — the offsets don't match.

But virtual classes with the same vtable structure can be cast to each other safely. If both have the same virtual methods in the same order (so the vtables are isomorphic), the vtable pointers point to the same kind of structure, and the field offsets after the vtable pointer align. This is what the numa paper exploits.

Where this matters: the numa paper's cast trick

From paper §3.5, Fig. 4 lines 47:

numa<Point, 0>* p0 = new numa<Point, 0>();
p0->x = 42.0;
Point* p = reinterpret_cast<Point*>(p0);   // cast added by their compiler tool
cout << p0->x;                              // still prints 42.0

This works because numa<Point, 0> and Point are deliberately laid out the same way:

  • Same fields (each wrapped, but numa<double, 0> is just {double contents;} — same 8 bytes as double).
  • Same virtual methods in the same order — vtables align.
  • No new fields added by the wrapper.

So Point and numa<Point, 0> are layout-compatible standard-layout types (or close enough that the cast works in practice). The numa typer can freely insert reinterpret_cast to convert between them, which is how it preserves the user's Stack* references inside method bodies.

Why this doesn't transfer to persistent<T>

For us, the equivalent would be reinterpret_cast<Stack*>(new persistent<Stack>()). But the layouts diverge at the first field:

Field Stack persistent<Stack>
top Node* — 8 bytes pmem_ptr<persistent<Node>> — 16 bytes
size int — 4 bytes persistent<int> — 4 bytes (same)

Stack::top starts at offset 0 and is 8 bytes; persistent<Stack>::top starts at offset 0 but is 16 bytes (it holds a PMEMoid, which is {uint64_t pool_uuid_lo; uint64_t offset;}). After the first field:

  • Stack::size is at offset 8.
  • persistent<Stack>::size is at offset 16.

A reinterpret_cast<Stack*>(persistent_stack_ptr) would read 8 bytes of OID as a Node* — meaningless. And size would read the wrong bytes too. Total silent corruption.

This is unavoidable in our design. PMEMoids have to be 16 bytes (pool ID plus offset, so the pointer survives process restart). Raw Node* is 8 bytes and doesn't survive. So our wrapper must use a different in-memory representation, and the layout shift is the cost.

Where this appears in persistent<T>

We deliberately rejected layout compatibility:

  • persistent<Stack> is not a drop-in replacement for Stack* via reinterpret_cast.
  • Methods of persistent<Stack> operate on pmem_ptr<persistent<Node>> and persistent<int> throughout — never coercing back to Node* / int.
  • pmem_ptr<T>'s operator bool is marked explicit and there is no operator T* — meaning you can't accidentally convert a pmem_ptr to a raw T* in passing. You have to call .get() explicitly, which is a deliberate "I'm taking a raw pointer view" act.

The price: methods that would have been polymorphic across persistent and non-persistent code via reinterpret_cast (the numa pattern) instead have to be duplicated — one body for Stack, another for persistent<Stack>. The Phase 5 typer will mechanize this duplication, but the underlying reason is layout incompatibility from this chapter.

The takeaway

Two types with the same layout are reinterpret-cast-compatible; two types with different layouts are not. reinterpret_cast is a low-level escape hatch, not a type conversion — it changes the interpretation of a pointer without changing the bits it points to. If the bits don't match the new type's expectations, you get silent corruption. Always reason explicitly about field offsets before reaching for it.


Chapter 18: const Correctness and Const Overloading

The idea

When you mark a member function const, you are promising the compiler that calling it will not modify the object's observable state. The compiler enforces that promise by treating *this as a const T inside that function — every non-mutable field becomes effectively const, every non-const member call on this fails to compile.

Practically, this matters because the constness of an object propagates outward. If a function takes a const Widget&, only const-qualified methods can be called on it. If a class has a const Widget field, only const methods can be called from any of its methods. A single non-const accessor at the bottom of a chain makes the whole chain uncallable from any const context above it.

Simple example

class Counter {
    int n;
public:
    int get()       { return n; }   // non-const
    void inc()     { ++n; }
};

void print_it(const Counter& c) {
    std::cout << c.get();   // ERROR: 'get' is not const-qualified
}

The fix is one keyword:

int get() const { return n; }

Now c.get() works whether c is const or not. A non-const object binds happily to a const member function — the constness of *this is a contract about what the function will do, not a requirement about what the caller must be.

When you need two overloads

const-qualifying a function that returns by value (like get() above) is essentially free — one overload covers both cases. But if a function returns a reference to a member, you usually need two versions:

class Buffer {
    std::vector<int> data;
public:
    int&       at(size_t i)       { return data[i]; }   // for writes
    const int& at(size_t i) const { return data[i]; }   // for reads on const objects
};

Why both? Because the return type must reflect the constness of the object. If you only kept the const overload:

const int& at(size_t i) const;   // only this

Then buf.at(0) = 42; fails to compile, because the call returns const int& and you can't assign to a const reference. And if you only kept the non-const one, you couldn't call at on a const Buffer at all.

The same pattern applies to conversion operators:

class Wrapper {
    int value;
public:
    operator int&()             { return value; }   // allows w = 5;
    operator const int&() const { return value; }   // allows (const Wrapper)w being read
};

Overload resolution picks based on whether the object expression is const. Two near-identical bodies, but the constness of the return type is what carries the semantic difference.

Why you can't just const_cast your way out

A common temptation: "I have a const method, but I know the storage isn't really const — let me just const_cast *this and return a mutable reference."

const int& at(size_t i) const {
    return const_cast<int&>(data[i]);  // technically compiles
}

This is undefined behavior if any caller ever holds a truly const object (e.g. const Buffer b; at file scope — the compiler may place it in read-only memory, and writing through your returned reference will segfault or be silently optimized away). The const-qualifier isn't just a hint; it's a guarantee the optimizer is allowed to rely on. Lying with const_cast lets you compile a program that the compiler is permitted to break.

The right move is the dual overload above. Slightly more code, no undefined behavior, no lies.

mutable — the well-defined escape hatch

There's a legitimate case where a const method does modify state: caching, lazy initialization, mutex locking. C++ has a keyword for it:

class Lookup {
    mutable std::mutex m;
    std::unordered_map<int, std::string> cache;
public:
    std::string find(int k) const {
        std::lock_guard<std::mutex> g(m);  // OK: m is mutable
        return cache.at(k);
    }
};

mutable says "this field is allowed to change even from const methods" — usually because the change isn't observable (a mutex lock/unlock pair, a memoized result that returns the same value either way). Use sparingly; overuse defeats the point of const in the first place.

Where this appears in persistent<T>

The primitive specialization has three accessor-style members: load(), operator T&(), and operator->(). Originally none were const-qualified, which meant that any const method on a wrapper containing a persistent<int> field (e.g. a Stack::print() const reading size) failed to compile. The fix:

  • load() returns by value → single const overload suffices.
  • operator->() returns by value → single const overload suffices.
  • operator T&() returns a reference → needs both the non-const and the new operator const T&() const overload, so *counter = 5 (write) and int n = *counter from a const context (read) both work.

The takeaway

const propagates: a single non-const accessor at the bottom of a containment chain blocks all const callers above it. Const-correctness is cheap to add at design time and expensive to retrofit — it ripples through every method signature that touches the affected type. When a function returns by value, one const overload is enough; when it returns a reference, you almost always need two. Reach for mutable only when the modification truly is unobservable; reach for const_cast essentially never.