diff --git a/ELEMENTS_LLM_CONTEXT.md b/ELEMENTS_LLM_CONTEXT.md new file mode 100644 index 00000000..829f4221 --- /dev/null +++ b/ELEMENTS_LLM_CONTEXT.md @@ -0,0 +1,1440 @@ +# ELEMENTS Code Context Reference + +This document is an AI-agent skill for the [ELEMENTS](https://github.com/lanl/ELEMENTS) C++ library. It provides the public API, mental model, conventions, pitfalls, and end-to-end usage patterns required to write correct code against ELEMENTS, plus a maintainer-oriented section for editing the library itself. + +ELEMENTS is built on top of [MATAR](https://github.com/lanl/MATAR). **This document assumes the MATAR skill is loaded separately** and only cross-references MATAR types/macros where an ELEMENTS-specific pattern requires it. Do not duplicate MATAR content here; consult the MATAR skill for `MATAR_INITIALIZE`, `FOR_ALL`, `CArrayDual`, `MPICArrayKokkos`, `CommunicationPlan`, fence rules, host/device sync, etc. + +--- + +## Table of Contents + +1. [Purpose and Prerequisites](#1-purpose-and-prerequisites) +2. [Setup and Boilerplate](#2-setup-and-boilerplate) +3. [Data Type Taxonomy](#3-data-type-taxonomy) +4. [Mesh Lifecycle (`swage::Mesh`)](#4-mesh-lifecycle-swagemesh) +5. [Reference Elements and Quadrature](#5-reference-elements-and-quadrature) +6. [Geometry Helpers (`shapes.h`)](#6-geometry-helpers-shapesh) +7. [Mesh Decomposition Workflow](#7-mesh-decomposition-workflow) +8. [State and Halo Communication](#8-state-and-halo-communication) +9. [Mesh I/O Patterns](#9-mesh-io-patterns) +10. [End-to-End Walkthrough: `mesh_decomp_example.cpp`](#10-end-to-end-walkthrough-mesh_decomp_examplecpp) +11. [Common Pitfalls](#11-common-pitfalls) +12. [Maintainer / Internals](#12-maintainer--internals) +13. [Build and Configuration](#13-build-and-configuration) +14. [Ground Truth Constraints for LLMs](#14-ground-truth-constraints-for-llms) +15. [LLM Output Contract](#15-llm-output-contract) + +--- + +## 1. Purpose and Prerequisites + +ELEMENTS is a collection of small, header-only C++ sub-libraries for implementing low- and high-order numerical methods (continuous/discontinuous FEM, finite-volume) on 2D/3D unstructured meshes. The library is data-oriented: there are very few classes, mostly plain-old-data structs holding MATAR arrays and free functions that operate on them. + +### Sub-libraries + +| Sub-library | Header | Role | +|-------------|--------|------| +| **elements** | [src/elements/ref_elem.h](src/elements/ref_elem.h), [ref_quadrature.h](src/elements/ref_quadrature.h), [ref_surf_elem.h](src/elements/ref_surf_elem.h) | Reference element math: tensor-product Lagrange basis on Gauss-Lobatto / Gauss-Legendre points | +| **swage** | [src/swage/unstructured_mesh.h](src/swage/unstructured_mesh.h) | `swage::Mesh` — unstructured 2D/3D mesh of arbitrary order with full connectivity | +| **geometry** | [src/geometry/geometry.h](src/geometry/geometry.h), [shapes.h](src/geometry/shapes.h) | Analytic primitives (`Plane`, `Circle`, `Sphere`) — **not** finite-element shape functions | +| **decomp_utilities** | [src/decomp_utilities/decomp_utils.h](src/decomp_utilities/decomp_utils.h) | PT-Scotch mesh partitioning + ghost layers + MATAR `CommunicationPlan` setup | + +The single umbrella header [src/ELEMENTS.h](src/ELEMENTS.h) pulls everything in. + +### Sub-library dependency graph + +```mermaid +flowchart TD + Kokkos[Kokkos] + MATAR[MATAR] + MPI[MPI] + Scotch[PT-Scotch] + + Quadrature[ref_quadrature.h] + RefElem[ref_elem.h] + RefSurf["ref_surf_elem.h
(body in #if 0)"] + Swage[swage::Mesh] + Geometry[geometry / shapes] + Decomp[decomp_utils] + ELEMENTS[ELEMENTS.h] + + Kokkos --> MATAR + MATAR --> Quadrature + MATAR --> RefElem + MATAR --> Swage + MATAR --> Geometry + MATAR --> Decomp + MPI --> MATAR + MPI --> Decomp + Scotch --> Decomp + Quadrature --> RefElem + + RefElem --> ELEMENTS + RefSurf --> ELEMENTS + Quadrature --> ELEMENTS + Swage --> ELEMENTS + Geometry --> ELEMENTS + Decomp --> ELEMENTS + Swage --> Decomp +``` + +### Required prerequisite skills / knowledge + +- **MATAR skill**: types like `CArrayKokkos`, `DCArrayKokkos`, `MPICArrayKokkos`, `RaggedRightArrayKokkos`, `CommunicationPlan`; macros `FOR_ALL`, `FOR_ALL_CLASS`, `FOR_REDUCE_*`, `MATAR_INITIALIZE`, `MATAR_FINALIZE`, `MATAR_FENCE`; the host/device dual-pattern and fence-placement rules. +- **MPI**: rank/size, `MPI_COMM_WORLD`, neighbor collectives. ELEMENTS' decomposition is hardcoded to `MPI_COMM_WORLD`. +- **PT-Scotch (optional)**: only if reading/modifying `decomp_utils.h` internals; consumers do not need to call Scotch directly. + +### Namespaces used + +| Namespace | Contents | +|-----------|----------| +| `mtr::` | All MATAR types and macros (brought in via `using namespace mtr;`) | +| `swage::` | `Mesh` (the central mesh struct) and the small index functor structs | +| `elements::` | `partition_mesh`, `naive_partition_mesh`, `build_ghost`; the free quadrature helpers in `ref_quadrature.h` | +| `geometry::` | `Plane`, `Circle`, `Sphere` | +| `mesh_init::` | `elem_name_tag` enum | +| `mesh_input::` | `source` and `type` enums (used by example `mesh_input_t`) | + +--- + +## 2. Setup and Boilerplate + +### Single header for consumers + +```cpp +#include "ELEMENTS.h" +``` + +This header includes MATAR plus every ELEMENTS sub-library header. See [src/ELEMENTS.h](src/ELEMENTS.h): + +```1:20:src/ELEMENTS.h +#ifndef ELEMENTS_LIBRARY_H +#define ELEMENTS_LIBRARY_H + +// --- Core Dependencies --- +#include "matar.h" + +// --- Swage Headers (Mesh & Core) --- +#include "swage/unstructured_mesh.h" +#include "elements/ref_elem.h" +#include "elements/ref_surf_elem.h" +#include "elements/ref_quadrature.h" + + +// --- Decomp Utilities --- +#include "decomp_utilities/decomp_utils.h" + +// --- Geometry Headers --- +#include "geometry/geometry.h" + +#endif // ELEMENTS_LIBRARY_H +``` + +### Initialization order is critical + +For MPI programs, **`MPI_Init` must come before `MATAR_INITIALIZE`** and **`MPI_Finalize` must come after `MATAR_FINALIZE`**. The MATAR scope braces must enclose every MATAR-owned object so destructors run before `MATAR_FINALIZE`. + +#### Template A — serial / single-rank ELEMENTS program + +```cpp +// COMPILES_AS_IS (assuming you fill in the body) +#include "ELEMENTS.h" + +using namespace mtr; + +int main(int argc, char** argv) { + MATAR_INITIALIZE(argc, argv); + { // MATAR scope + // Build a swage::Mesh, call build_connectivity, do work. + // All MATAR/ELEMENTS objects must live INSIDE this scope. + } + MATAR_FINALIZE(); + return 0; +} +``` + +#### Template B — MPI / multi-rank ELEMENTS program + +```cpp +// COMPILES_AS_IS (assuming you fill in the body) +#include +#include "ELEMENTS.h" +#include "state.h" // node_t, GaussPoint_t, State_t (from examples/*/include) +#include "mesh_io.h" // build_3d_box, write_vtu (from examples/*/include) + +using namespace mtr; + +int main(int argc, char** argv) { + MPI_Init(&argc, &argv); + MATAR_INITIALIZE(argc, argv); + { // MATAR scope + int rank, size; + MPI_Comm_rank(MPI_COMM_WORLD, &rank); + MPI_Comm_size(MPI_COMM_WORLD, &size); + + swage::Mesh initial_mesh; + MPICArrayKokkos initial_node_coords; + // ... build on rank 0, partition, communicate ... + } + MATAR_FINALIZE(); + MPI_Finalize(); + return 0; +} +``` + +The decomp example shows the canonical ordering: + +```32:36:examples/decomp_example/src/mesh_decomp_example.cpp +int main(int argc, char** argv) { + + MPI_Init(&argc, &argv); + MATAR_INITIALIZE(argc, argv); + { // MATAR scope +``` + +```294:297:examples/decomp_example/src/mesh_decomp_example.cpp + } // end MATAR scope + MATAR_FINALIZE(); + MPI_Finalize(); + + return 0; +``` + +### `state.h` and `mesh_io.h` are example infrastructure, not library headers + +The structs `node_t`, `GaussPoint_t`, `State_t`, the input-deck `mesh_input_t`, and the I/O functions (`build_3d_box`, `build_2d_polar`, `write_vtu`, `write_vtk`, `read_vtk_mesh`) live in [examples/decomp_example/include/](examples/decomp_example/include/) and [examples/average/include/](examples/average/include/). They are not part of the installed library. Treat them as **reference implementations** that downstream applications copy and adapt, and cite the example file path when generating code that depends on them. + +--- + +## 3. Data Type Taxonomy + +ELEMENTS exposes a small set of user-facing types. Use this table to pick the right one. + +### Mesh and state + +| Type | Defined in | Role | +|------|------------|------| +| `swage::Mesh` | [src/swage/unstructured_mesh.h](src/swage/unstructured_mesh.h) | The mesh: counts, connectivity, MPI ownership metadata. Two mutually exclusive init paths (linear vs arbitrary-order tensor). | +| `node_t` | [examples/*/include/state.h](examples/decomp_example/include/state.h) | Per-node state wrapper holding `MPICArrayKokkos` for `coords`, `coords_n0`, `scalar_field`, `vector_field`. | +| `GaussPoint_t` | [examples/*/include/state.h](examples/decomp_example/include/state.h) | Per-cell/Gauss-point state: `MPICArrayKokkos` for `fields` and `fields_vec`. | +| `State_t` | [examples/*/include/state.h](examples/decomp_example/include/state.h) | Bundle: `node_t node;` + `GaussPoint_t GaussPoints;`. | +| `mesh_input_t` | [examples/*/include/mesh_inputs.h](examples/decomp_example/include/mesh_inputs.h) | Input-deck struct: `num_dims`, `source`, `file_path`, `type`, `origin[3]`, `length[3]`, `num_elems[3]`, `p_order`, polar params, scale, `object_ids`. | + +### Reference element / quadrature + +| Type / function | Defined in | Role | +|-----------------|------------|------| +| `fe_ref_elem_t` | [src/elements/ref_elem.h](src/elements/ref_elem.h) | Reference **volume** element (3D-only `init` body today). Tensor-product Lagrange basis on Gauss-Lobatto + Gauss-Legendre grids; pre-tabulates basis values and gradients. | +| `fe_ref_surf_t` | [src/elements/ref_surf_elem.h](src/elements/ref_surf_elem.h) | **Currently a placeholder.** The struct compiles to an empty type because lines 19-2118 are wrapped in `#if 0`. Do not call any methods on it. | +| `elements::get_lobatto_nodes_1D`, `get_lobatto_weights_1D`, `get_legendre_nodes_1D`, `get_legendre_weights_1D` | [src/elements/ref_quadrature.h](src/elements/ref_quadrature.h) | Free `static KOKKOS_FUNCTION` helpers that fill a pre-allocated `CArrayKokkos` with 1D quadrature data. Stateless. | + +### Analytic geometry + +| Type | Defined in | Role | +|------|------------|------| +| `geometry::Plane` | [src/geometry/shapes.h](src/geometry/shapes.h) | Infinite plane; method `is_point_on_plane(point)`. | +| `geometry::Circle` | [src/geometry/shapes.h](src/geometry/shapes.h) | Circle in space; method `is_point_on_circle(point)`. | +| `geometry::Sphere` | [src/geometry/shapes.h](src/geometry/shapes.h) | 3D sphere; methods `is_point_on_sphere`, `is_point_inside_sphere`, `signed_distance_to_sphere`. | + +These are **analytic predicates**, not finite-element shape functions. There is no `Hex8`, `Tet4`, `Quad4`, etc. in this library. + +### Decomposition entry points + +| Function | Defined in | Role | +|----------|------------|------| +| `elements::naive_partition_mesh(...)` | [src/decomp_utilities/decomp_utils.h](src/decomp_utilities/decomp_utils.h) | Round-robin scatter from rank 0 to all ranks (no Scotch). | +| `elements::partition_mesh(...)` | [src/decomp_utilities/decomp_utils.h](src/decomp_utilities/decomp_utils.h) | High-level driver: naive partition → PT-Scotch → redistribute → ghost layer. The function consumers normally call. | +| `elements::build_ghost(...)` | [src/decomp_utilities/decomp_utils.h](src/decomp_utilities/decomp_utils.h) | Identifies shared/ghost entities and fills two `CommunicationPlan`s (element + node). | + +### MATAR types you will see most + +The MATAR types most frequently used in ELEMENTS code (full reference in the MATAR skill): + +| Type | Used for | +|------|---------| +| `DCArrayKokkos` | Connectivity tables (`nodes_in_elem`, `local_to_global_*_mapping`, boundary IDs). Dual host/device. | +| `CArrayKokkos` / `CArrayKokkos` | Stride arrays, surface/patch lists, basis tables, quadrature points. Device. | +| `RaggedRightArrayKokkos` | Variable-length adjacency: `elems_in_elem`, `corners_in_node`, `nodes_in_node`, `bdy_patches_in_set`. | +| `MPICArrayKokkos` | Halo-aware fields: node coordinates, `node_t` / `GaussPoint_t` members. Wraps a `DCArrayKokkos` and a `CommunicationPlan`. | +| `CArrayDual` | Small two-sided arrays used during partitioning (`elems_in_elem_on_rank`). | +| `CommunicationPlan` | MPI neighbor topology; one for elements, one for nodes. | +| `ViewCArrayKokkos` | Used by `geometry::Plane/Circle/Sphere` to wrap caller-owned coordinate buffers. | + +--- + +## 4. Mesh Lifecycle (`swage::Mesh`) + +### The struct is `swage::Mesh` (not `mesh_t`) + +The closing comment in the header reads `}; // end Mesh_t` but the actual type is `swage::Mesh`. Always type `swage::Mesh`. + +```217:226:src/swage/unstructured_mesh.h +struct Mesh +{ + // ******* Entity Definitions **********// + // Element: A hexahedral volume + // Zone: A discretization of an element base on subdividing the element using the nodes + // Node: A kinematic degree of freedom + // Surface: The 2D surface of the element + // Patch: A discretization of a surface by subdividing the surface using the nodes + // Corner: A element-node pair +``` + +### Entity vocabulary + +| Entity | Meaning | Notes | +|--------|---------|-------| +| **Element** | Volume cell (hex in 3D, quad in 2D) | Index variable: `elem_gid`. The primary volumetric unit. | +| **Node** | Kinematic DOF | Index: `node_gid`. For high-order meshes there are many nodes per element, not just 8 vertices. `num_nodes_in_elem = (Pn+1)^num_dims` for arbitrary-order; `2^num_dims` for linear. | +| **Zone** | Subcell partition of an element | `num_zones_in_elem = Pn^num_dims`. Each zone has `2^num_dims` nodes. Used in DG-style schemes. | +| **Surface** | 2D interface (or 1D edge in 2D) of the element | `num_surfs_in_elem` is 4 (2D) or 6 (3D). | +| **Patch** | Subdivision of a surface | `num_nodes_in_patch = 2*(num_dims - 1)` (2 in 2D, 4 in 3D). Interior patches are paired across adjacent elements; boundary patches are collected in `bdy_patches`. | +| **Corner** | (element, local-node) pair | Global corner id = `node_lid + elem_gid * num_nodes_in_elem`. | +| **Gauss point** | Volume quadrature point | `gauss_in_elem(elem_gid, gauss_lid) →` global Gauss id; populated via `gauss_in_elem_t` functor. | +| **Lobatto point** | Material/interface point | `lobatto_in_elem` functor exists; some Lobatto wiring in `build_patch_connectivity` is incomplete (commented out). | + +The two element-formulation tags live in `mesh_init`: + +```44:55:src/swage/unstructured_mesh.h +namespace mesh_init +{ +// element mesh types +enum elem_name_tag +{ + linear_simplex_element = 0, + linear_tensor_element = 1, + arbitrary_tensor_element = 2 +}; + +// other enums could go here on the mesh +} // end namespace +``` + +`linear_simplex_element` is reserved but not actively used by the build routines today — the `build_patch_connectivity` switches between linear-tensor (hex/quad with fixed local-node tables) and arbitrary-tensor (Pn-order, tensor indexing). + +### Hex node ordering (linear, 0-7) + +For `linear_tensor_element` in 3D, the hex local-node ordering is documented at the top of [src/swage/unstructured_mesh.h](src/swage/unstructured_mesh.h): + +```57:94:src/swage/unstructured_mesh.h +/* +========================== +Nodal indexing convention +========================== + + K + ^ J + | / + | / + | / + 6------------------7 + /| /| + / | / | + / | / | + / | / | + / | / | +4------------------5 | +| | | | ----> I +| | | | +| | | | +| | | | +| 2------------|-----3 +| / | / +| / | / +| / | / +| / | / +|/ |/ +0------------------1 + +nodes are ordered for outward normal +patch 0: [0,4,6,2] xi-minus dir +patch 1: [1,3,7,5] xi-plus dir +patch 2: [0,1,5,4] eta-minus dir +patch 3: [3,2,6,7] eta-plus dir +patch 4: [0,2,3,1] zeta-minus dir +patch 6: [4,5,7,6] zeta-plus dir +*/ +``` + +This is **not** standard VTK or Ensight order. The `mesh_io.h` readers/writers in the examples include explicit reordering tables (e.g. an "Ensight to IJK" permutation in `read_vtk_mesh`). + +### Mesh build protocol + +The lifecycle is a strict ordered protocol. You must: + +1. Set the node count. +2. Choose **exactly one** of the two `initialize_elems*` paths and fill `nodes_in_elem` with **global** node IDs. +3. Call `build_connectivity()`. + +```mermaid +flowchart TD + A[initialize_nodes(N)] --> B{High order?} + B -- "p_order == 1
(linear hex/quad)" --> C[initialize_elems(E, dim)] + B -- "p_order > 1
(arbitrary tensor)" --> D[initialize_elems_Pn(E, dim, Pn)] + C --> E[Fill nodes_in_elem(elem_gid, node_lid)
with global node IDs] + D --> E + E --> F[build_connectivity()] + F --> G[build_corner_connectivity] + F --> H[build_elem_elem_connectivity] + F --> I[build_patch_connectivity] + F --> J[build_node_node_connectivity] + G --> H + H --> I + I --> J +``` + +`build_connectivity()` runs the four sub-builds in this fixed order: + +```1468:1482:src/swage/unstructured_mesh.h + void build_connectivity() + { + verbose = true; + build_corner_connectivity(); + if (verbose) printf("Built corner connectivity \n"); + + build_elem_elem_connectivity(); + if (verbose) printf("Built element-element connectivity \n"); + + build_patch_connectivity(); + if (verbose) printf("Built patch connectivity \n"); + + build_node_node_connectivity(); + if (verbose) printf("Built node-node connectivity \n"); + } +``` + +Always prefer `build_connectivity()` over calling the four `build_*` functions individually unless you have a specific reason. The functions have ordering dependencies (`elem_elem` requires `corner`; `patch` requires `elem_elem`; `node_node` requires `patch`). + +**`build_connectivity()` sets `verbose = true` as a side effect.** If you do not want the printf output, set `mesh.verbose = false` after calling it. + +### Linear vs arbitrary-order init + +```340:361:src/swage/unstructured_mesh.h + void initialize_elems(const size_t num_elems_inp, const size_t num_dims_inp) + { + num_dims = num_dims_inp; + num_nodes_in_elem = 1; + + for (int dim = 0; dim < num_dims; dim++) { + num_nodes_in_elem *= 2; + } + num_elems = num_elems_inp; + nodes_in_elem = DCArrayKokkos(num_elems, num_nodes_in_elem, "mesh.nodes_in_elem"); + corners_in_elem = CArrayKokkos(num_elems, num_nodes_in_elem, "mesh.corners_in_elem"); + + // 1 Gauss point per element + num_gauss_in_elem = 1; + + // 1 zone per element + num_zones_in_elem = 1; + + gauss_in_elem = gauss_in_elem_t(num_gauss_in_elem); + + return; + }; // end method +``` + +```364:394:src/swage/unstructured_mesh.h + void initialize_elems_Pn( + const size_t num_elems_inp, + const size_t num_dims_inp, + const size_t Pn_order) + { + + + elem_kind = mesh_init::arbitrary_tensor_element; + + num_dims = num_dims_inp; + num_elems = num_elems_inp; + + Pn = Pn_order; + + num_nodes_in_elem = std::pow(Pn_order + 1, num_dims); //(Pn_order + 1)**num_dims; // (Pn +1) + num_nodes_in_zone = std::pow(2, num_dims); // (4, or 8, always) + num_gauss_in_elem = std::pow(2*Pn_order, num_dims); // = 2*Pn + num_zones_in_elem = std::pow(Pn_order, num_dims); // Pn + num_surfs_in_elem = num_dims == 2 ? 4 : 6; // 4 or 6 (always) + + num_zones = num_zones_in_elem * num_elems; + + nodes_in_elem = DCArrayKokkos(num_elems, num_nodes_in_elem, "mesh.nodes_in_elem"); + corners_in_elem = CArrayKokkos(num_elems, num_nodes_in_elem, "mesh.corners_in_elem"); + zones_in_elem = zones_in_elem_t(num_zones_in_elem); + surfs_in_elem = CArrayKokkos(num_elems, num_surfs_in_elem, "mesh.surfs_in_zone"); + nodes_in_zone = CArrayKokkos(num_zones, num_nodes_in_zone, "mesh.nodes_in_zone"); + gauss_in_elem = gauss_in_elem_t(num_gauss_in_elem); + + return; + }; // end method +``` + +Choose based on `Pn_order`: `1` → `initialize_elems`; `>= 2` → `initialize_elems_Pn`. Mixing the two on the same `Mesh` is undefined — `initialize_elems` does not set `elem_kind = arbitrary_tensor_element` and does not allocate `surfs_in_elem` / `nodes_in_zone`. + +### Filling `nodes_in_elem` + +After init, write into `nodes_in_elem` on the host using `.host(elem_gid, node_lid) = node_gid` (it is a `DCArrayKokkos`). Then push to device with `nodes_in_elem.update_device()` if you need device access. The example builders (`build_3d_box`, `build_2d_polar`) handle this for you — see the `mesh_io.h` reference in [examples/decomp_example/include/mesh_io.h](examples/decomp_example/include/mesh_io.h). + +### Connectivity arrays (read-only after build) + +| Name | Type | Shape / meaning | +|------|------|-----------------| +| `nodes_in_elem` | `DCArrayKokkos` | `(num_elems, num_nodes_in_elem)` — global node IDs | +| `corners_in_elem` | `CArrayKokkos` | `(num_elems, num_nodes_in_elem)` — corner IDs | +| `corners_in_node` | `RaggedRightArrayKokkos` | per node, list of corner GIDs | +| `num_corners_in_node` | `CArrayKokkos` | `(num_nodes,)` | +| `elems_in_node` | `RaggedRightArrayKokkos` | per node, elements touching it | +| `elems_in_elem` | `RaggedRightArrayKokkos` | per element, neighboring element GIDs | +| `num_elems_in_elem` | `CArrayKokkos` | per-element neighbor counts | +| `nodes_in_node` | `RaggedRightArrayKokkos` | per node, neighbor nodes along edges | +| `num_nodes_in_node` | `CArrayKokkos` | per-node neighbor counts | +| `patches_in_elem`, `surfs_in_elem` | `CArrayKokkos` | element-side patch/surface tables | +| `nodes_in_patch`, `elems_in_patch`, `surf_in_patch` | `CArrayKokkos` | patch-side connectivity | +| `nodes_in_surf`, `elems_in_surf`, `patches_in_surf` | `CArrayKokkos` | surface-side connectivity | +| `nodes_in_zone` | `CArrayKokkos` | `(num_zones, num_nodes_in_zone)` — only valid after `initialize_elems_Pn` | +| `bdy_patches`, `bdy_nodes` | `CArrayKokkos` | flat boundary lists | +| `bdy_patches_in_set`, `bdy_nodes_in_set` | `RaggedRightArrayKokkos` | per boundary set; **populated by application code in `tag_bdys` (see comment in `init_bdy_sets`), not by `build_connectivity`** | + +### Index functors + +`zones_in_elem`, `gauss_in_elem`, `lobatto_in_elem` are not arrays — they are tiny stride-flattening structs: + +```115:139:src/swage/unstructured_mesh.h +struct zones_in_elem_t +{ + private: + size_t num_zones_in_elem_; + public: + zones_in_elem_t() { + }; + + zones_in_elem_t(const size_t num_zones_in_elem_inp) { + this->num_zones_in_elem_ = num_zones_in_elem_inp; + }; + + // return global zone index for given local zone index in an element + size_t host(const size_t elem_gid, const size_t zone_lid) const + { + return elem_gid * num_zones_in_elem_ + zone_lid; + }; + + // Return the global zone ID given an element gloabl ID and a local zone ID + KOKKOS_INLINE_FUNCTION + size_t operator()(const size_t elem_gid, const size_t zone_lid) const + { + return elem_gid * num_zones_in_elem_ + zone_lid; + }; +}; +``` + +Use them as `mesh.zones_in_elem(elem_gid, zone_lid)` inside device kernels and `mesh.zones_in_elem.host(elem_gid, zone_lid)` on the host. `gauss_in_elem_t` and `lobatto_in_elem_t` follow the same pattern. + +### Boundary sets + +```cpp +// COMPILES_AS_IS (after building mesh) +mesh.init_bdy_sets(num_bcs); +// At this point only num_bdy_sets and num_bdy_patches_in_set are allocated. +// Filling bdy_patches_in_set and bdy_nodes_in_set is the application's job +// (see geometry_new.cpp pattern in the legacy code). +``` + +The header explicitly defers ragged boundary set construction to a `tag_bdys`-style function written outside `swage::Mesh`. Do not assume `build_connectivity` populates these. + +### MPI ownership fields (filled by `decomp_utils`) + +After a successful call to `elements::partition_mesh`, the `final_mesh` will have: + +| Field | Meaning | +|-------|---------| +| `local_to_global_node_mapping` | `DCArrayKokkos`, length `num_nodes` | +| `local_to_global_elem_mapping` | `DCArrayKokkos`, length `num_elems` | +| `num_owned_elems` / `num_owned_nodes` | Count of owned (non-ghost) entities; ranks `[0, num_owned_*)` are owned, `[num_owned_*, num_*)` are ghost | +| `num_boundary_elems` / `num_boundary_nodes` | Count of locally-owned entities that are sent to neighbors | +| `boundary_elem_local_ids` / `boundary_node_local_ids` | `DCArrayKokkos` of local IDs for entities that send data | +| `num_ghost_elems` / `num_ghost_nodes` | Count of received entities | + +For a single-rank fallback (`world_size == 1`), set these manually: + +```129:133:examples/decomp_example/src/mesh_decomp_example.cpp + final_mesh = initial_mesh; + final_mesh.num_owned_elems = initial_mesh.num_elems; + final_mesh.num_owned_nodes = initial_mesh.num_nodes; + final_node_coords = initial_node_coords; +``` + +--- + +## 5. Reference Elements and Quadrature + +### `fe_ref_elem_t` (volume) + +Defined in [src/elements/ref_elem.h](src/elements/ref_elem.h) as a plain struct of `size_t` counts plus `CArrayKokkos` / `CArrayKokkos` tables. Construct with default constructor, then call `init(p_order, num_dim_inp)` exactly once: + +```cpp +// PSEUDOCODE_PATTERN +fe_ref_elem_t ref_elem; +ref_elem.init(p_order, 3); // 3D today +``` + +**Important caveat:** the body of `init` that fills the 3D tensor-product point/weight/basis tables is guarded by `if (num_dim == 3)`. Calling `init(p_order, 2)` will set the 1D counts and allocate arrays but will leave the 3D tables un-populated. **Treat `fe_ref_elem_t` as 3D-only for now.** + +#### Counts (after `init`) + +| Field | Meaning (for `p > 0`) | +|-------|-----------------------| +| `num_dofs_1d` | `p_order + 1` (CG kinematic DOFs in 1D) | +| `num_dg_dofs_1d` | `p_order` (DG / "thermo" DOFs in 1D) | +| `num_lobotto_1d` | `2*p_order + 1` (Gauss-Lobatto points in 1D) | +| `num_dual_lobotto_1d` | `2*p_order - 1` | +| `num_gauss_1d` | `2*p_order` (Gauss-Legendre in 1D) | +| `num_zones_1d` | `p_order` | +| `num_dofs_in_elem`, `num_lobotto_in_elem`, `num_gauss_in_elem`, `num_zones_in_elem`, ... | Each = corresponding 1D count to the power `num_dim`. | + +For `p_order == 0` the special case is `num_lobotto_1d = 2`, `num_dofs_1d = 2`, `num_zones_1d = 1`, `num_gauss_1d = 1`. + +#### Pre-tabulated arrays (read after `init`) + +| Field | Shape | Contents | +|-------|-------|----------| +| `lob_points_1D`, `dual_lob_points_1D`, `leg_nodes_1D` | `(num_*_1d,)` | 1D abscissas | +| `lob_weights_1D`, `leg_weights_1D` | `(num_*_1d,)` | 1D weights | +| `lobotto_point_positions`, `gauss_point_positions` | `(num_*_in_elem, num_dim)` | 3D quadrature points | +| `lobotto_point_weights`, `gauss_point_weights` | `(num_*_in_elem,)` | 3D quadrature weights | +| `dof_positions`, `dof_positions_1d` | DOF coordinates (CG basis) | +| `dg_dof_positions`, `dg_dof_positions_1d` | DOF coordinates (DG basis) | +| `lobotto_point_basis`, `gauss_point_basis` | `(num_*_in_elem, num_basis)` | CG basis values at quadrature points | +| `lobotto_point_dg_basis`, `gauss_point_dg_basis` | `(num_*_in_elem, num_dg_basis)` | DG basis values | +| `lobotto_point_grad_basis`, `gauss_point_grad_basis` | gradients in reference coords | +| `dof_lobatto_map`, `dual_dof_lobatto_map` | `CArrayKokkos` | Maps from DOF to Lobatto-grid index | + +#### Index helpers (KOKKOS_INLINE_FUNCTION) + +Defined inside `fe_ref_elem_t`: + +| Method | Returns | +|--------|---------| +| `dof_rid(i, j, k)` | Flat CG DOF index from i,j,k tensor coordinates | +| `elem_dof_rid(i, j, k)` | Flat DG DOF index | +| `lobatto_rid(i, j, k)` | Flat Lobatto quadrature point index | +| `dual_lobatto_rid(i, j, k)` | Flat dual Lobatto index | +| `legendre_rid(i, j, k)` | Flat Gauss-Legendre point index | +| `legendre_rid_2D(i, j)` | 2D legendre index helper | + +#### On-the-fly basis evaluation + +For evaluating the basis (or its gradient) at an arbitrary reference point, call: + +| Method | Purpose | +|--------|---------| +| `get_basis(basis, val_1d, val_3d, point)` | CG basis at `point` | +| `get_elem_basis(basis, val_1d, val_3d, point)` | DG basis at `point` | +| `partial_xi_basis(partial_xi, val_1d, val_3d, Dval_1d, Dval_3d, point)` | ∂/∂ξ of CG basis | +| `partial_eta_basis(...)` | ∂/∂η | +| `partial_mu_basis(...)` | ∂/∂μ (the third reference coord) | +| `lagrange_basis_1D(interp, x_point)` | 1D Lagrange interpolant at `x_point` (CG nodes) | +| `lagrange_elem_basis_1D(interp, x_point)` | 1D Lagrange interpolant at `x_point` (DG nodes) | +| `lagrange_derivative_1D(derivative, x_point)` | 1D Lagrange derivative | + +The caller supplies workspace buffers as `CArrayKokkos`: +- `val_1d` length `num_dofs_1d` +- `val_3d` shape `(num_dofs_1d, 3)` +- `Dval_1d`, `Dval_3d` for derivatives, same shapes +- `point` length 3 +- `basis` length `num_basis` + +### `fe_ref_surf_t` is currently inactive + +```cpp +// File: src/elements/ref_surf_elem.h +// Lines 17-18: struct opens +// Line 19: #if 0 // ENTIRE BODY DISABLED +// ... +// Line 2118: #endif +// Line 2120: }; // empty struct closes +``` + +The compiled type `fe_ref_surf_t` has **no data members and no methods**. It still compiles into [src/ELEMENTS.h](src/ELEMENTS.h), but instantiating it gives you an empty struct. **Do not generate code that calls any method on `fe_ref_surf_t`.** If a 2D surface basis is required, either (a) use the 1D / 2D parts of `fe_ref_elem_t` directly, or (b) flag this as a known gap to the user. + +### `ref_quadrature.h` — free helpers + +The header is a `namespace elements { ... }` of stateless `static KOKKOS_FUNCTION` writers that fill a pre-allocated 1D buffer: + +```cpp +// PSEUDOCODE_PATTERN +using namespace mtr; +CArrayKokkos nodes(N, "nodes_1D"); +CArrayKokkos weights(N, "weights_1D"); + +elements::get_lobatto_nodes_1D(nodes, N); +elements::get_lobatto_weights_1D(weights, N); + +elements::get_legendre_nodes_1D(nodes, N); +elements::get_legendre_weights_1D(weights, N); +``` + +These are large hard-coded if-cascades on `N`. Supported `N` values are bounded by what is tabulated in the source — check [src/elements/ref_quadrature.h](src/elements/ref_quadrature.h) for the exact list before generating code with an unusual order. + +--- + +## 6. Geometry Helpers (`shapes.h`) + +`geometry::Plane`, `geometry::Circle`, `geometry::Sphere` are analytic primitives, not FE shapes. Each constructor takes `ViewCArrayKokkos` views of caller-owned position (and normal/radius) buffers. All predicate methods are `KOKKOS_INLINE_FUNCTION` and assume 1D views with extents matching `num_dims`. + +### Constructors + +```cpp +// COMPILES_AS_IS (assuming MATAR scope) +using namespace mtr; + +double pos_data[3] = {0.0, 0.0, 0.0}; +double normal_data[3] = {0.0, 0.0, 1.0}; +ViewCArrayKokkos pos(pos_data, 3); +ViewCArrayKokkos normal(normal_data, 3); + +geometry::Plane plane(pos, normal); +geometry::Circle circle(pos, normal, /*radius=*/1.0); +geometry::Sphere sphere(pos, /*radius=*/1.0); +``` + +The `Plane` and `Circle` constructors normalize the supplied normal in place; if the input normal has zero length, `Plane` throws `std::runtime_error` while `Circle` falls back to `(1, 0, 0)`. `Sphere` requires `position.extent() == 3`. + +### Predicates + +| Method | Type | Notes | +|--------|------|-------| +| `Plane::is_point_on_plane(point)` | `bool` | `|n . (p - x0)| < epsilon` | +| `Circle::is_point_on_circle(point)` | `bool` | On plane AND distance < radius | +| `Sphere::is_point_on_sphere(point)` | `bool` | `|distance - radius| < epsilon` | +| `Sphere::is_point_inside_sphere(point)` | `bool` | `distance < radius` | +| `Sphere::signed_distance_to_sphere(point)` | `double` | `distance - radius` | + +`epsilon` defaults to `1e-7` per object and is a public field if you need to override it before calling. + +### What does NOT exist + +There is no `Hex8`, `Tet4`, `Quad4`, `evaluate_basis(xi, eta)`, `jacobian()`, or `node_coords()` API in `geometry/`. Do not invent them. For FE basis evaluation use `fe_ref_elem_t` (Section 5). + +--- + +## 7. Mesh Decomposition Workflow + +`decomp_utils.h` provides three free functions in `namespace elements`. The one consumers normally call is `partition_mesh`; the other two are exposed building blocks. + +### Three entry points + +| Function | When to call | +|----------|-------------| +| `naive_partition_mesh` | Block-distribute global mesh (rank 0) into contiguous element ranges across all ranks. Minimal communication setup. Exposed mostly for `partition_mesh`'s internal use. | +| `partition_mesh` | The high-level driver. Handles naive partition → PT-Scotch repartition → halo construction → fills both `CommunicationPlan`s. **This is the one to call.** | +| `build_ghost` | Given an already-partitioned mesh, identify shared-boundary and ghost entities and populate two `CommunicationPlan`s. Called by `partition_mesh`. | + +### `partition_mesh` signature + +```1712:1720:src/decomp_utilities/decomp_utils.h +inline void partition_mesh( + swage::Mesh& initial_mesh, + swage::Mesh& final_mesh, + MPICArrayKokkos& initial_node_coords, + MPICArrayKokkos& final_node_coords, + CommunicationPlan& element_communication_plan, + CommunicationPlan& node_communication_plan, + int world_size, + int rank){ +``` + +- `initial_mesh` / `initial_node_coords`: must be valid on rank 0; can be empty/default on other ranks. +- `final_mesh` / `final_node_coords`: outputs. Will be sized for owned + ghost entities on each rank. +- `element_communication_plan`, `node_communication_plan`: must be **already initialized** with `MPI_COMM_WORLD` (see workflow below). The function fills the rest. +- `world_size`, `rank`: from `MPI_Comm_size` / `MPI_Comm_rank`. The function uses **`MPI_COMM_WORLD` exclusively** internally; `world_size` and `rank` are passed for convenience, not for routing on a non-default communicator. + +### Canonical workflow + +```mermaid +flowchart TD + A[MPI_Init then MATAR_INITIALIZE] --> B[Build initial_mesh on rank 0
Build initial_node_coords on rank 0] + B --> C[MPI_Barrier] + C --> D[Construct two CommunicationPlans
and call .initialize(MPI_COMM_WORLD)] + D --> E{world_size > 1?} + E -- "Yes" --> F[elements::partition_mesh(...)] + E -- "No" --> G[final_mesh = initial_mesh
num_owned_elems = num_elems
num_owned_nodes = num_nodes
final_node_coords = initial_node_coords] + F --> H[verify_graph_communicator on both plans] + G --> H + H --> I[Initialize node_t / GaussPoint_t
with the appropriate plan] + I --> J[Compute, communicate, write VTU] +``` + +### Code template + +```cpp +// PSEUDOCODE_PATTERN — fills in mesh build, communication, and output +#include +#include "ELEMENTS.h" +#include "state.h" +#include "mesh_io.h" + +using namespace mtr; + +int main(int argc, char** argv) { + MPI_Init(&argc, &argv); + MATAR_INITIALIZE(argc, argv); + { + int rank, world_size; + MPI_Comm_rank(MPI_COMM_WORLD, &rank); + MPI_Comm_size(MPI_COMM_WORLD, &world_size); + + swage::Mesh initial_mesh; + MPICArrayKokkos initial_node_coords; + + if (rank == 0) { + double origin[3] = {0.0, 0.0, 0.0}; + double length[3] = {1.0, 1.0, 1.0}; + int nelems[3] = {20, 20, 20}; + int Pn_order = 3; + build_3d_box(initial_mesh, initial_node_coords, origin, length, nelems, Pn_order); + } + MPI_Barrier(MPI_COMM_WORLD); + + CommunicationPlan element_plan; + element_plan.initialize(MPI_COMM_WORLD); + CommunicationPlan node_plan; + node_plan.initialize(MPI_COMM_WORLD); + + swage::Mesh final_mesh; + MPICArrayKokkos final_node_coords; + + if (world_size != 1) { + elements::partition_mesh(initial_mesh, final_mesh, + initial_node_coords, final_node_coords, + element_plan, node_plan, + world_size, rank); + } else { + final_mesh = initial_mesh; + final_mesh.num_owned_elems = initial_mesh.num_elems; + final_mesh.num_owned_nodes = initial_mesh.num_nodes; + final_node_coords = initial_node_coords; + } + + // Now use final_mesh + element_plan / node_plan to allocate state and run. + } + MATAR_FINALIZE(); + MPI_Finalize(); + return 0; +} +``` + +### What happens inside `partition_mesh` + +1. **`naive_partition_mesh`** scatters elements/connectivity/coordinates from rank 0 in contiguous blocks; produces a `naive_mesh` and CSR neighbor lists per local element. +2. **PT-Scotch** is initialized with `SCOTCH_dgraphInit(&dgraph, MPI_COMM_WORLD)`. The graph is built from the per-rank element CSR (`vertloctab` / `edgeloctab` containing global element IDs of neighbors). Strategy is `SCOTCH_STRATQUALITY` with imbalance ratio `0.001`, mapping to a `SCOTCH_archCmplt(arch, world_size)` complete-graph architecture. +3. **`SCOTCH_dgraphMap`** assigns each local element a destination rank in `partloctab`. +4. Elements, connectivity, and node coordinates are **redistributed** with `MPI_Alltoall` + `MPI_Alltoallv` into an `intermediate_mesh`. Then `intermediate_mesh.build_connectivity()` is called. +5. **`build_ghost`** identifies which owned elements/nodes are shared with which neighbor ranks, builds the ghost layer in `final_mesh`, and fills both `CommunicationPlan`s via `initialize_graph_communicator` + `setup_send_recv` (using `DRaggedRightArrayKokkos` send/recv index lists). + +After `partition_mesh` returns: +- `final_mesh.num_elems` = owned + ghost elements; `final_mesh.num_owned_elems` = owned only. +- Same for `num_nodes` / `num_owned_nodes`. +- Element local IDs `[0, num_owned_elems)` are owned, `[num_owned_elems, num_elems)` are ghost. Same for nodes. +- `local_to_global_*_mapping` is filled. +- Both `CommunicationPlan`s are ready to be attached to `MPICArrayKokkos` fields via `field.initialize_comm_plan(plan)`. + +### PT-Scotch hard-codings to be aware of + +- The communicator is **always** `MPI_COMM_WORLD`. Calling `partition_mesh` from inside a sub-communicator requires editing the source. +- The graph is **element-adjacency only** — no node-based partitioning, no weights. +- Architecture is always a complete graph of size `world_size` (one part per rank); cannot map to a hierarchical machine topology without modifying the function. + +--- + +## 8. State and Halo Communication + +State containers (`node_t`, `GaussPoint_t`) hold `MPICArrayKokkos` fields. Each field carries its own packed send/recv buffers and a pointer to a shared `CommunicationPlan`. Halo exchange is `field.communicate()`. + +### Initialization + +```59:101:examples/decomp_example/include/state.h +struct node_t +{ + + // Replace with MPIDCArrayKokkos + MPICArrayKokkos coords; ///< Nodal coordinates + MPICArrayKokkos coords_n0; ///< Nodal coordinates at tn=0 of time integration + + MPICArrayKokkos scalar_field; ///< Scalar field on a node + MPICArrayKokkos vector_field; ///< Vector field on a node + + + // initialization method (num_nodes, num_dims, state to allocate) + void initialize(size_t num_nodes, size_t num_dims, std::vector node_states) + { + + CommunicationPlan comm_plan; + + for (auto field : node_states){ + switch(field){ + case node_state::coords: + if (coords.size() == 0){ + this->coords = MPICArrayKokkos(num_nodes, num_dims, "node_coordinates"); + this->coords.initialize_comm_plan(comm_plan); + } + if (coords_n0.size() == 0){ + this->coords_n0 = MPICArrayKokkos(num_nodes, num_dims, "node_coordinates_n0"); + this->coords_n0.initialize_comm_plan(comm_plan); + } + break; + case node_state::scalar_field: + if (scalar_field.size() == 0) this->scalar_field = MPICArrayKokkos(num_nodes, "node_scalar_field"); + this->scalar_field.initialize_comm_plan(comm_plan); + break; + case node_state::vector_field: + if (vector_field.size() == 0) this->vector_field = MPICArrayKokkos(num_nodes, num_dims, "node_vector_field"); + this->vector_field.initialize_comm_plan(comm_plan); + break; + default: + std::cout<<"Desired node state not understood in node_t initialize"< n_states = { + node_state::coords, node_state::scalar_field, node_state::vector_field +}; +final_node.initialize(final_mesh.num_nodes, final_mesh.num_dims, + n_states, node_communication_plan); + +std::vector g_states = { + gauss_pt_state::fields, gauss_pt_state::fields_vec +}; +gauss_point.initialize(final_mesh.num_elems, final_mesh.num_dims, + g_states, element_communication_plan); +``` + +Note: Gauss-point-per-cell state is sized to **`final_mesh.num_elems`**, not to the per-element Gauss-point count. The example uses one logical "Gauss point" per element, sharing the element communication plan. + +### The compute → exchange → use cycle + +For every halo exchange, follow this pattern (cross-link to the MATAR fence rules): + +```cpp +// PSEUDOCODE_PATTERN +// 1. Write into owned slots [0, num_owned_*). +for (int i = 0; i < final_mesh.num_owned_elems; i++) { + gauss_point.fields.host(i) = compute_on_host(i); +} +// (Or use a FOR_ALL on device.) + +// 2. Push to device. +gauss_point.fields.update_device(); + +// 3. Exchange halos. communicate() internally calls update_host, packs send buffers, +// invokes MPI_Neighbor_alltoallv, unpacks recv buffers, and update_device's the result. +gauss_point.fields.communicate(); + +// 4. Now ghost slots [num_owned_elems, num_elems) hold fresh neighbor data. +// Run a kernel that reads owned + ghost. +FOR_ALL(i, 0, final_mesh.num_elems, { + double sum = 0.0; + for (int j = 0; j < final_mesh.num_elems_in_elem(i); j++) { + sum += gauss_point.fields(final_mesh.elems_in_elem(i, j)); + } + // ... +}); +MATAR_FENCE(); +``` + +### When do you need a `MATAR_FENCE()`? + +Apply the MATAR fence rules from the MATAR skill. The ELEMENTS-specific instances: + +| Situation | Fence required? | +|-----------|:----:| +| `FOR_ALL` writes to a device array, next `FOR_ALL` reads the same array | Yes | +| `FOR_ALL` writes a field, then call `field.communicate()` | Yes (the `update_host` inside `communicate` is host-side; without a fence the device kernel may not have finished) | +| `FOR_ALL` writes, then call `field.update_host()` | Yes | +| `field.communicate()` returns, then a `FOR_ALL` reads the field | No — `communicate()` ends with `update_device()` which fences | +| Two independent `FOR_ALL` blocks writing different arrays | No | + +The decomp example uses `MATAR_FENCE()` after every device kernel that feeds the next phase, plus `MPI_Barrier` between rank-coordinated phases (search for `MATAR_FENCE();` in [examples/decomp_example/src/mesh_decomp_example.cpp](examples/decomp_example/src/mesh_decomp_example.cpp) for concrete placements). + +--- + +## 9. Mesh I/O Patterns + +These functions are part of the example infrastructure (`examples/*/include/mesh_io.h`), not of the installed library. They are reference implementations. + +### Mesh generation + +| Function | Defined in | Signature | Purpose | +|----------|------------|-----------|---------| +| `build_3d_box` | [examples/decomp_example/include/mesh_io.h](examples/decomp_example/include/mesh_io.h) | `(swage::Mesh&, MPICArrayKokkos&, double origin[3], double length[3], int num_elems_dim[3], int Pn_order)` | Generate a 3D Cartesian box of arbitrary order on rank 0. Calls `initialize_nodes`, `initialize_elems_Pn`, fills `nodes_in_elem`, then `build_connectivity`. | +| `build_2d_polar` | [examples/decomp_example/include/mesh_io.h](examples/decomp_example/include/mesh_io.h) | `(swage::Mesh&, MPICArrayKokkos&, double& inner_radius, double& outer_radius, double& start_angle, double& end_angle, int num_elems_i, int num_elems_j)` | Generate an annular sector mesh in 2D. | +| `build_3d_box` (average variant) | [examples/average/include/mesh_io.h](examples/average/include/mesh_io.h) | Uses `node_t&` for coordinates instead of `MPICArrayKokkos&`. | Equivalent behavior tied to the `node_t` state container. | + +### Output + +| Function | Defined in | Notes | +|----------|------------|-------| +| `write_vtu(swage::Mesh&, node_t&, GaussPoint_t&, int rank, MPI_Comm comm)` | [examples/decomp_example/include/mesh_io.h](examples/decomp_example/include/mesh_io.h) | Per-rank `.vtu` file; rank 0 also writes `.pvtu`. Cell type 72 for high-order Lagrange hex, 9 for linear quad in 2D. Outputs to `vtk/` subdirectory. | +| `write_vtu` (average variant) | [examples/average/include/mesh_io.h](examples/average/include/mesh_io.h) | Same pattern, fields drawn from the average example's state. | +| `write_vtk` | [examples/average/include/mesh_io.h](examples/average/include/mesh_io.h) only | Legacy ASCII `.vtk` per rank, plus a small JSON `Fierro.vtk.series` index. Cell type 72. | + +### Input + +| Function | Defined in | Notes | +|----------|------------|-------| +| `read_vtk_mesh` | [examples/average/include/mesh_io.h](examples/average/include/mesh_io.h) only | ASCII `.vtk` reader for `POINTS` / `CELLS` / `CELL_TYPES`. Linear hex (cell type 12) is reordered from Ensight → IJK convention. After reading, call `build_connectivity`. | + +The decomp example does **not** provide a VTK reader; for that workflow, the assumption is that the global mesh is generated programmatically on rank 0. + +### Helper functions + +| Helper | Defined in | Purpose | +|--------|------------|---------| +| `split(string, delim)` | mesh_io.h | Tokenize string | +| `get_id(i, j, k, num_i, num_j)` | mesh_io.h | `i + j*num_i + k*num_i*num_j` flat index | +| `PointIndexFromIJK(i, j, k, const int* order)` | mesh_io.h | Map (i, j, k) tensor coords to a Lagrange-hex VTK local DOF index | + +### Typical I/O sequence + +```cpp +// PSEUDOCODE_PATTERN +if (rank == 0) { + build_3d_box(initial_mesh, initial_node_coords, origin, length, nelems, Pn_order); + // optional: morph node coordinates with FOR_ALL, then update_device() +} + +// ... partition, communicate ... + +write_vtu(final_mesh, final_node, gauss_point, rank, MPI_COMM_WORLD); +``` + +--- + +## 10. End-to-End Walkthrough: `mesh_decomp_example.cpp` + +This is the canonical end-to-end ELEMENTS program. Use it as the structural template for any new MPI ELEMENTS application. The file lives at [examples/decomp_example/src/mesh_decomp_example.cpp](examples/decomp_example/src/mesh_decomp_example.cpp). + +### Phase map + +```mermaid +flowchart TD + A["MPI_Init → MATAR_INITIALIZE
open MATAR scope"] --> B[Get rank and world_size
set num_dims and Pn_order] + B --> C{rank == 0?} + C -- "Yes" --> D[build_3d_box or build_2d_polar
optional FOR_ALL morph
update_device] + C -- "No" --> E[empty initial_mesh] + D --> F[MPI_Barrier] + E --> F + F --> G[Init element_plan and node_plan
with MPI_COMM_WORLD] + G --> H{world_size > 1?} + H -- "Yes" --> I[partition_mesh fills final_mesh
final_node_coords and both plans] + H -- "No" --> J[Copy initial to final
set num_owned to num_] + I --> K[verify_graph_communicator on both plans
MPI_Barrier] + J --> K + K --> L["Init GaussPoint_t with element_plan
seed owned to rank, ghost to -1
update_device, communicate, FOR_ALL average over elems_in_elem"] + L --> M["Init node_t with node_plan
seed owned and ghost
communicate scalar and vector fields
FOR_ALL smoothing over nodes_in_node (3 passes)"] + M --> N[update_host on output fields
MPI_Barrier] + N --> O["write_vtu(final_mesh, final_node, gauss_point, rank, MPI_COMM_WORLD)"] + O --> P[Print timing
close MATAR scope] + P --> Q[MATAR_FINALIZE then MPI_Finalize] +``` + +### Key code patterns from this file + +**Initial mesh build on rank 0** (lines 77-112): only rank 0 calls `build_3d_box` or `build_2d_polar`; all ranks then `MPI_Barrier`. + +**Optional high-order coordinate morph** (lines 95-101): a `FOR_ALL` over `initial_mesh.num_nodes` perturbs `initial_node_coords` on the device, followed by `initial_node_coords.update_device()`. The morph is only applied if `Pn_order > 1`. + +**Communication plan setup and partition call** (lines 119-133): two plans are constructed, both initialized on `MPI_COMM_WORLD`, and `partition_mesh` is invoked only when `world_size != 1`. The single-rank fallback manually copies the mesh and sets `num_owned_*` to the full counts. + +**Plan verification** (lines 140-141): + +```cpp +element_communication_plan.verify_graph_communicator(); +node_communication_plan.verify_graph_communicator(); +``` + +**Gauss-point ghost test** (lines 158-217): seeds owned cells with `static_cast(rank)` and ghosts with `-1.0` / `-100.0`, then `update_device → communicate → FOR_ALL average over elems_in_elem`. Demonstrates the read pattern `gauss_point.fields(final_mesh.elems_in_elem(i, j))`. + +**Node ghost test with smoothing** (lines 221-274): runs 3 smoothing passes that average each node with its `nodes_in_node` neighbors. Uses a temporary `DCArrayKokkos` buffer to avoid in-place updates inside the kernel. + +**Output** (lines 282-285): `write_vtu` is called from every rank; rank 0 also writes the `.pvtu` index inside `write_vtu`. + +**Cleanup** (lines 294-297): close the MATAR scope, then `MATAR_FINALIZE`, then `MPI_Finalize`. + +### `examples/average/src/average.cpp` is currently a stub + +```1:24:examples/average/src/average.cpp +#include +#include +#include + +// This pulls in kokkos, matar, mesh, ref_elem stuff, and PT-Scotch +#include "ELEMENTS.h" + +// IO Utilities for the mesh +#include "mesh_inputs.h" +#include "mesh_io.h" +#include "state.h" + + +int main(int argc, char** argv) { + +MATAR_INITIALIZE(argc, argv); +{ // MATAR scope + std::cout<<"Hello, Average Example!"< 1`. The fallback explicitly copies the mesh and sets `num_owned_elems` / `num_owned_nodes` to the full counts. + +6. **MATAR types use `()` indexing.** Cross-reference the MATAR skill. + +7. **`fe_ref_surf_t` body is `#if 0`.** Generate code as if the type has no public API (which it currently does not). + +8. **All MPI calls in `decomp_utils.h` use `MPI_COMM_WORLD`** regardless of arguments. There is no sub-communicator support today. + +9. **The example `mesh_io.h` files differ.** When citing `read_vtk_mesh` or `write_vtk`, use the path `[examples/average/include/mesh_io.h](examples/average/include/mesh_io.h)`. When citing `build_2d_polar` or the decomp variant of `write_vtu`, use the path `[examples/decomp_example/include/mesh_io.h](examples/decomp_example/include/mesh_io.h)`. The `state.h` and `mesh_inputs.h` files are byte-equivalent across the two examples. + +10. **`elem_kind` defaults to `linear_tensor_element`.** Only `initialize_elems_Pn` switches it to `arbitrary_tensor_element`. `linear_simplex_element` is reserved but unused. + +11. **Boundary sets require post-`build` population.** `init_bdy_sets` allocates the count array only; ragged set membership is filled by application BC-tagging code. + +--- + +## 15. LLM Output Contract + +When producing ELEMENTS code, every output must: + +- **Boilerplate** + - `#include "ELEMENTS.h"` (and `state.h` / `mesh_io.h` from the appropriate example directory if used) + - `using namespace mtr;` + - For MPI programs: `MPI_Init` → `MATAR_INITIALIZE` → MATAR scope braces → `MATAR_FINALIZE` → `MPI_Finalize` + - For serial programs: `MATAR_INITIALIZE` → MATAR scope braces → `MATAR_FINALIZE` + +- **Mesh setup** + - Always pick exactly one `initialize_elems*` path + - Always fill `nodes_in_elem` before `build_connectivity` + - Always use `swage::Mesh` (not `mesh_t`) + +- **Decomposition (multi-rank)** + - Build mesh on rank 0, others empty + - `MPI_Barrier` after rank-0 build + - Two `CommunicationPlan`s, both `.initialize(MPI_COMM_WORLD)` + - Guard `partition_mesh` with `if (world_size != 1)` and provide single-rank fallback + - Initialize `node_t` / `GaussPoint_t` with the matching `CommunicationPlan&` overload + +- **Halo exchange** + - Set owned slots → `update_device()` → `field.communicate()` → read ghost slots + - `MATAR_FENCE()` after device kernels whose output is consumed by the next host or sync operation + +- **No invented APIs** + - No methods on `fe_ref_surf_t` + - No FE shape classes from `geometry/` + - No `mesh_t`, no `partition_mesh` on a sub-communicator + - No 2D `fe_ref_elem_t::init` (state the limitation if asked) + +- **Snippet labels** + - `COMPILES_AS_IS` — buildable as written (assuming declared variables and a `MATAR_INITIALIZE` scope) + - `PSEUDOCODE_PATTERN` — contains application-specific placeholders or `// ...` ellipsis + +- **Citations** + - Cite a specific file path for every API used. When in doubt, point at one of: + - [src/swage/unstructured_mesh.h](src/swage/unstructured_mesh.h) + - [src/elements/ref_elem.h](src/elements/ref_elem.h) + - [src/elements/ref_quadrature.h](src/elements/ref_quadrature.h) + - [src/decomp_utilities/decomp_utils.h](src/decomp_utilities/decomp_utils.h) + - [src/geometry/shapes.h](src/geometry/shapes.h) + - [examples/decomp_example/src/mesh_decomp_example.cpp](examples/decomp_example/src/mesh_decomp_example.cpp) + - [examples/decomp_example/include/state.h](examples/decomp_example/include/state.h) + - [examples/decomp_example/include/mesh_io.h](examples/decomp_example/include/mesh_io.h)