Genesis-Embodied-AI · hughperkins · May 16, 2026 · May 16, 2026 · May 16, 2026 · May 16, 2026
diff --git a/docs/source/user_guide/compound_types.md b/docs/source/user_guide/compound_types.md
diff --git a/docs/source/user_guide/fastcache.md b/docs/source/user_guide/fastcache.md
@@ -50,43 +50,13 @@ qd.init(arch=qd.gpu)
 # qd.init(arch=qd.gpu, print_non_pure=True)
 ```
 
-## Dataclass fields with cached values
-
-By default, for `dataclasses.dataclass` parameters, fastcache only includes the *types* of each field in the cache key, not their values. This is fine for fields like ndarrays, where the compiled kernel doesn't depend on the actual data, only the dtype and dimensionality.
-
-However, some dataclass fields hold configuration values that get baked into the compiled kernel — typically values used with `qd.static()`, such as loop bounds or feature flags:
-
-```python
-for i in qd.static(range(config.num_layers)):
-    ...
-```
-
-Here the value of `num_layers` is compiled into the kernel. Concretely the loop will be unrolled, at compile time. If `num_layers` changes, a different kernel must be compiled.
-
-Mark such fields with `add_value_to_cache_key` so their values are included in the cache key:
-
-```python
-import dataclasses
-from quadrants.lang._fast_caching import FIELD_METADATA_CACHE_VALUE
-
-@dataclasses.dataclass
-class SimConfig:
-    num_envs: int = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
-    dt: float = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
-    use_gravity: bool = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
-```
-
-With this annotation, changing `num_envs` from 100 to 200 produces a different cache key so the correct compiled kernel is looked up (or compiled if not yet cached). Without it, the wrong kernel could be loaded.
-
-Note: `@qd.data_oriented` objects and `qd.Template` parameters already include primitive values in the cache key automatically — this annotation is only needed for `dataclasses.dataclass` fields.
-
 ## Constraints
 
 A kernel is eligible for fastcache only if all of the following hold:
 
 ### 1. All data flows through parameters
 
-The kernel must receive every piece of data it operates on as an explicit parameter. It must **not** capture variables from the enclosing Python scope (closures over fields, ndarrays, or mutable globals). This is the core "purity" constraint — the compiled kernel's behavior must be fully determined by its arguments.
+The kernel must receive every piece of data it operates on as an explicit parameter. It must **not** capture variables from the enclosing Python scope (closures over ndarrays, mutable globals, or any other external state). This is the core "purity" constraint — the compiled kernel's behavior must be fully determined by its arguments.
 
 ```python
 a = qd.ndarray(qd.f32, (10,))
@@ -125,8 +95,8 @@ Fastcache supports the following parameter types:
 | `qd.types.NDArray` (scalar, vector, matrix) | Yes | dtype, ndim, layout |
 | `torch.Tensor` | Yes | dtype, ndim |
 | `numpy.ndarray` | Yes | dtype, ndim |
-| `dataclasses.dataclass` | Yes | field types recursively; field values if annotated with `add_value_to_cache_key` (see [above](#dataclass-fields-with-cached-values)) |
-| `@qd.data_oriented` objects | Yes | member types and primitive member values recursively |
+| `dataclasses.dataclass` | Yes | member types recursively; member values if annotated with `FIELD_METADATA_CACHE_VALUE` (see [Appendix — compound-type cache keying](#compound-type-cache-keying)) |
+| `@qd.data_oriented` objects | Yes | member types recursively; primitive member types and values baked into kernel (see [Appendix — compound-type cache keying](#compound-type-cache-keying)) |
 | `qd.Template` primitives (int, float, bool) | Yes | type and value (baked into kernel) |
 | Non-template primitives (int, float, bool) | Yes | type only |
 | `enum.Enum` | Yes | name and value |
@@ -172,3 +142,33 @@ print(obs.cache_stored)         # True if the compiled kernel was stored to cach
 ```
 
 On the first run you'll see `cache_stored=True` but `cache_loaded=False`. On the second run (after `qd.init`), `cache_loaded=True`.
+
+## Appendix
+
+### Compound-type cache keying
+
+The args hasher walks compound-type kernel parameters recursively. For each leaf member it decides what (if anything) contributes to the cache key. The headline rules:
+
+**`@qd.data_oriented`:** the walker descends into `vars(obj)`. For each child:
+
+- `qd.ndarray` member — `(dtype, ndim, layout)` is included in the cache key. Element values are not.
+- Primitive (`int` / `float` / `bool` / `enum.Enum`) member — value is baked into the kernel (same semantics as a `qd.Template` primitive). Two instances of the same class with different primitive member values get different cache entries.
+- Nested `@qd.data_oriented` member — recurses.
+- Nested `dataclasses.dataclass` member — recurses (with the dataclass rules below).
+- `qd.field` member — fastcache is disabled for the entire kernel call. The kernel still runs via normal compilation; a warn-level log line is emitted.
+
+**`dataclasses.dataclass`:** the walker descends into the declared members. For each member, only the *type* is included in the cache key by default — **not** the value. To include a member's value, annotate it:
+
+```python
+import dataclasses
+from quadrants.lang._fast_caching import FIELD_METADATA_CACHE_VALUE
+
+@dataclasses.dataclass
+class SimConfig:
+    num_layers: int = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
+    dt: float = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
+```
+
+This is necessary whenever the compiled kernel depends on the member's *value* rather than just its type (for example, when the value is used as a loop bound that the compiler bakes into the generated code). Without the annotation, two `SimConfig` instances with different `num_layers` values would share a fastcache key, and the second instance would silently load a kernel compiled for the wrong value.
+
+Note the asymmetry: `@qd.data_oriented` primitive members are baked into the kernel automatically (same semantics as `qd.Template`); `dataclasses.dataclass` members contribute only their *type* to the cache key unless you opt in per-member.
diff --git a/docs/source/user_guide/tensor.md b/docs/source/user_guide/tensor.md
@@ -203,6 +203,15 @@ fill(b)   # ndarray branch
 
 The kernel argument is unwrapped to the bare impl before the template-mapper / AST sees it, so kernel bodies still write `x[i, j]` and pay no per-call cost for the wrapper.
 
+`qd.Tensor` is also the right annotation when storing a tensor as a `dataclasses.dataclass` member:
+
+```python
+@dataclass
+class State:
+    a: qd.Tensor
+    b: qd.Tensor
+```
+
 ## Pickle
 
 `qd.Tensor` objects are picklable on **both** backends, including under non-identity layouts. Round-trip (pickle then unpickle) preserves the canonical data, the dtype, the shape, and the layout:

diff --git a/python/quadrants/lang/_template_mapper.py b/python/quadrants/lang/_template_mapper.py
@@ -5,10 +5,29 @@
 from quadrants.lang import impl
 from quadrants.lang.impl import Program
 from quadrants.lang.kernel_arguments import ArgMetadata
+from quadrants.lang.util import is_data_oriented
 
 from .._test_tools import warnings_helper
 from ._kernel_types import ArgsHash
-from ._template_mapper_hotpath import _extract_arg, _primitive_types
+from ._template_mapper_hotpath import (
+    _extract_arg,
+    _primitive_types,
+    _struct_nd_paths_for,
+)
+
+
+def _collect_data_oriented_nd_ids(arg: Any, out: list) -> None:
+    """Append ``id(ndarray)`` for every ndarray reachable from ``arg``, using the per-class path cache in
+    ``_template_mapper_hotpath._struct_nd_paths_for`` so the first call walks ``vars(arg)`` once and subsequent calls
+    are just ``getattr`` chains. Empty path list short-circuits with zero work — critical for genesis's
+    ``@qd.data_oriented`` Solver passed as ``self`` to every kernel.
+    """
+    for chain in _struct_nd_paths_for(arg):
+        v = arg
+        for a in chain:
+            v = getattr(v, a)
+        out.append(id(v))
+
 
 Key: TypeAlias = tuple[Any, ...]
 
@@ -71,6 +90,17 @@ def lookup(self, raise_on_templated_floats: bool, args: tuple[Any, ...]) -> tupl
         # branching for primitive types dramatically improve performance of hash computation.
         mapping_cache_tracker: list[ReferenceType | None] | None = None
         args_hash: ArgsHash = tuple([id(arg) for arg in args])
+        # ``@qd.data_oriented`` containers can have their member ndarrays reassigned between calls on the same instance
+        # (``state.x = other_ndarray``). The id(arg) alone does not capture that, so the spec-key cache below would
+        # serve a stale entry and the new ndarray's dtype/ndim would be wrong. Fold the reachable ndarray ids into the
+        # hash. No-op for data_oriented containers that hold no ndarrays — the walker returns an empty list. See
+        # ``_collect_data_oriented_nd_ids``.
+        nd_ids: list = []
+        for arg in args:
+            if is_data_oriented(arg):
+                _collect_data_oriented_nd_ids(arg, nd_ids)
+        if nd_ids:
+            args_hash = args_hash + tuple(nd_ids)
         try:
             mapping_cache_tracker = self._mapping_cache_tracker[args_hash]
         except KeyError:

diff --git a/python/quadrants/lang/_template_mapper_hotpath.py b/python/quadrants/lang/_template_mapper_hotpath.py
@@ -25,6 +25,7 @@
 a consequence of inlining 'is_dataclass' and 'fields'.
 """
 
+import dataclasses
 import weakref
 from dataclasses import _FIELD, _FIELDS
 from typing import Any, Union
@@ -71,6 +72,88 @@
 _primitive_types = {int, float, bool}
 
 
+# Per-class cache: ``type(arg) -> list[tuple[str, ...]]`` of attribute paths whose values are ``Ndarray`` instances at
+# first observation. Populated lazily by ``_struct_nd_paths_for`` on the first call with each new data_oriented (or
+# nested dataclass) class. Empty list means "this class holds no ndarrays anywhere", in which case subsequent calls
+# pay only a dict-lookup per arg. Non-empty list short-circuits the full ``vars()`` recursion and just resolves each
+# cached path via ``getattr`` chains. Critical for the genesis field-backend hot path: the ``@qd.data_oriented``
+# Solver is passed as ``self`` to most kernels and holds dozens of attributes, so a full per-call ``vars()`` walk
+# costs >100ns per kernel and trashed FPS until this cache was added.
+_struct_nd_paths_cache: dict[type, list[tuple]] = {}
+
+
+def _build_struct_nd_paths(obj: Any, prefix: tuple, out: list) -> None:
+    if dataclasses.is_dataclass(obj) and not isinstance(obj, type):
+        children = ((f.name, getattr(obj, f.name)) for f in dataclasses.fields(obj))
+    else:
+        # ``NamedTuple`` (decorated as ``@qd.data_oriented``) has no instance ``__dict__`` — fall back to ``_asdict()``
+        # which materialises a dict view of the named fields. Mirrors the same fallback in
+        # ``args_hasher.stringify_obj_type`` so the per-class path cache here picks up ndarray members on NamedTuples
+        # too (regression covered by ``test_args_hasher_named_tuple``).
+        try:
+            children = obj._asdict().items()
+        except AttributeError:
+            children = obj.__dict__.items()
+    for k, v in children:
+        chain = prefix + (k,)
+        if type(v) in _TENSOR_WRAPPER_TYPES:
+            v = v._unwrap()
+        v_type = type(v)
+        if issubclass(v_type, Ndarray):
+            out.append(chain)
+        elif is_data_oriented(v) or (dataclasses.is_dataclass(v) and not isinstance(v, type)):
+            _build_struct_nd_paths(v, chain, out)
+
+
+def _struct_nd_paths_for(arg: Any) -> list[tuple]:
+    """Return the cached attribute paths (each a tuple of attr-name strings) at which ``Ndarray`` instances are
+    reachable from ``arg`` of type ``type(arg)``. First call for a class walks ``arg`` once via
+    ``_build_struct_nd_paths``; subsequent calls are dict-lookups.
+
+    Trades freshness for speed: assumes the *set* of ndarray-holding attribute paths is stable across instances of
+    the same class. The genesis Solver and similar ``@qd.data_oriented`` containers satisfy this — their ndarray
+    members are declared in ``__init__`` and not added later. If you need to add an ndarray attribute after the first
+    kernel launch on an instance of a given class, the new attribute won't be tracked. Call ``invalidate_struct_nd_
+    paths_for`` (below) or restart the program.
+
+    FIXME (Codex #3 on PR #704, https://github.com/Genesis-Embodied-AI/quadrants/pull/704#discussion_r3253281957):
+    the cache is keyed by ``type(arg)`` only. If two instances of the same class have *polymorphic attribute
+    structure* — e.g. instance A has ``.x`` as a ``qd.ndarray``-backed ``qd.Tensor`` while instance B has the same
+    ``.x`` as a field-backed ``qd.Tensor`` — the paths discovered from the first-walked instance are reused for the
+    second. ``_collect_struct_nd_descriptors`` then unconditionally reads ndarray-only attrs (``element_type``,
+    ``grad``, ``_qd_layout``) on what is now a ``ScalarField``, raising before the kernel can run. The fix is the
+    per-instance walk implemented on top of this branch in PR #705; this branch ships the class-level cache as-is.
+    """
+    cls = type(arg)
+    paths = _struct_nd_paths_cache.get(cls)
+    if paths is None:
+        paths = []
+        _build_struct_nd_paths(arg, (), paths)
+        _struct_nd_paths_cache[cls] = paths
+    return paths
+
+
+def _collect_struct_nd_descriptors(arg: Any, out: list) -> None:
+    """Emit per-ndarray shape descriptors ``(joined-path, element_type, ndim, needs_grad, layout)`` for every ndarray
+    reachable from ``arg``. Used by the template-mapper to refine the spec key for ``@qd.data_oriented`` args holding
+    ndarrays — see the data_oriented branch in ``_extract_arg``.
+
+    FIXME (Codex #3 on PR #704): when a polymorphic instance reuses a cached path that pointed to an ``Ndarray`` on
+    the first-walked instance, ``v`` here can be a ``ScalarField`` and the ``v.element_type`` / ``v.grad`` /
+    ``v._qd_layout`` reads will raise. See ``_struct_nd_paths_for`` above for details. Fixed in PR #705 via the
+    per-instance walk redesign.
+    """
+    for chain in _struct_nd_paths_for(arg):
+        v = arg
+        for a in chain:
+            v = getattr(v, a)
+        if type(v) in _TENSOR_WRAPPER_TYPES:
+            v = v._unwrap()
+        type_id = id(v.element_type)
+        element_type = type_id if type_id in primitive_types.type_ids else v.element_type
+        out.append((".".join(chain), element_type, len(v.shape), v.grad is not None, v._qd_layout))
+
+
 def _extract_arg(raise_on_templated_floats: bool, arg: Any, annotation: AnnotationType, arg_name: str) -> Any:
     # ``qd.Tensor`` wrappers passed as struct fields. Top-level kernel-arg unwrap in ``Kernel.__call__`` covers direct
     # args, but the dataclass-field recursion at the bottom of this function walks struct attributes via raw
@@ -124,7 +207,7 @@ def _extract_arg(raise_on_templated_floats: bool, arg: Any, annotation: Annotati
             raise QuadrantsRuntimeTypeError(
                 "Ndarray shouldn't be passed in via `qd.template()`, please annotate your kernel using `qd.types.ndarray(...)` instead"
             )
-        if arg_type in _composite_mutable_types or is_data_oriented(arg):
+        if arg_type in _composite_mutable_types:
             # [Composite arguments] Return weak reference to the object
             # Quadrants kernel will cache the extracted arguments, thus we can't simply return the original argument.
             # Instead, a weak reference to the original value is returned to avoid memory leak.
@@ -134,6 +217,21 @@ def _extract_arg(raise_on_templated_floats: bool, arg: Any, annotation: Annotati
             # 1. Invalid weak-ref will leave a dead(dangling) entry in both caches: "self.mapping" and "self.compiled_functions"
             # 2. Different argument instances with same type and same value, will get templatized into separate kernels.
             return weakref.ref(arg)
+        if is_data_oriented(arg):
+            # Same memory-leak avoidance as above — keep ``weakref.ref(arg)`` so the spec key never holds a strong
+            # reference to user state. But for data_oriented containers that hold ``Ndarray`` members, the live
+            # ``weakref`` alone is too coarse: same instance with ``state.x = other_ndarray`` of a different dtype/ndim
+            # would re-use the previously-compiled kernel, which was specialised for the old shape. Walk the reachable
+            # ndarrays and prepend their shape descriptors so dtype/ndim changes trigger re-specialisation. Mirrors what
+            # the dataclass branch below does via ``annotation_fields``.
+            #
+            # Containers with no ndarrays keep the original short-path (one spec per instance via weakref) so this is
+            # a no-op for the existing data_oriented + qd.field workloads (genesis field-backend).
+            nd_descriptors: list = []
+            _collect_struct_nd_descriptors(arg, nd_descriptors)
+            if nd_descriptors:
+                return (id(type(arg)), tuple(nd_descriptors), weakref.ref(arg))
+            return weakref.ref(arg)
 
         # Return value directly for other types, i.e. primitive types and all qd.Field-derived classes
         if raise_on_templated_floats and arg_type is float:

diff --git a/python/quadrants/lang/ast/ast_transformers/function_def_transformer.py b/python/quadrants/lang/ast/ast_transformers/function_def_transformer.py
@@ -34,7 +34,7 @@
 from quadrants.lang.matrix import MatrixType
 from quadrants.lang.stream import stream_parallel
 from quadrants.lang.struct import StructType
-from quadrants.lang.util import to_quadrants_type
+from quadrants.lang.util import is_data_oriented, to_quadrants_type
 from quadrants.types import annotations, buffer_view_type, ndarray_type, primitive_types
 
 
@@ -149,6 +149,21 @@ def _transform_kernel_arg(
                         field.type,
                         this_arg_features[field_idx],
                     )
+                elif isinstance(field.type, type) and getattr(field.type, "_data_oriented", False):
+                    # ``@qd.data_oriented`` field type inside a typed-dataclass kernel arg. The two patterns are
+                    # semantically incompatible at this layer: dataclass kernel-arg recursion uses annotations to
+                    # flatten leaf fields into per-leaf kernel args at compile time, but data_oriented containers don't
+                    # carry per-attribute type annotations — they need a value-driven walk
+                    # (``_predeclare_struct_ndarrays``), which only fires for ``qd.template()`` / ``qd.Tensor``
+                    # annotations. Rather than silently miscompile, raise a clear error pointing users to the
+                    # recommended pattern.
+                    raise QuadrantsSyntaxError(
+                        f"Kernel arg {argument_name!r}: field {field.name!r} has @qd.data_oriented type "
+                        f"{field.type.__name__!r}, which cannot be flattened into a typed-dataclass kernel arg. "
+                        f"Use ``{argument_name}: qd.template()`` for the outer kernel arg annotation instead; "
+                        f"data_oriented contents (including nested ndarrays) are walked at kernel-compile time via "
+                        f"the template path."
+                    )
                 else:
                     result, obj = FunctionDefTransformer._decl_and_create_variable(
                         ctx,
@@ -226,14 +241,18 @@ def _walk_obj(obj, arg_idx, path):
                         child = child._unwrap()
                     if isinstance(child, _ndarray.Ndarray):
                         _register_ndarray(child, arg_idx, (*path, field.name))
-                    elif dataclasses.is_dataclass(child) and not isinstance(child, type):
+                    elif (dataclasses.is_dataclass(child) and not isinstance(child, type)) or is_data_oriented(child):
                         _walk_obj(child, arg_idx, (*path, field.name))
             else:
                 for attr_name, attr_val in vars(obj).items():
                     if isinstance(attr_val, _TensorClass):
                         attr_val = attr_val._unwrap()
                     if isinstance(attr_val, _ndarray.Ndarray):
                         _register_ndarray(attr_val, arg_idx, (*path, attr_name))
+                    elif (dataclasses.is_dataclass(attr_val) and not isinstance(attr_val, type)) or is_data_oriented(
+                        attr_val
+                    ):
+                        _walk_obj(attr_val, arg_idx, (*path, attr_name))
 
         def _register_ndarray(nd, arg_idx, attr_chain):
             key = id(nd)