Skip to content

fix(mir): thread declared mailbox capacity and overflow policy through actor lowering#2389

Merged
slepp merged 5 commits into
mainfrom
fix/mailbox-capacity
Jul 4, 2026
Merged

fix(mir): thread declared mailbox capacity and overflow policy through actor lowering#2389
slepp merged 5 commits into
mainfrom
fix/mailbox-capacity

Conversation

@slepp

@slepp slepp commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Summary

A declared mailbox capacity and overflow policy were silently discarded — every actor mailbox ran unbounded regardless of what was written in source. I thread the declared capacity and overflow policy through MIR lowering and both codegen spawn sites (direct actors and supervised children), map the overflow policy by name rather than by ordinal (the parser's and runtime's enum orderings differ, so a cast would have silently wired the wrong policy), and make coalesce fail closed with a not-yet-implemented diagnostic instead of silently discarding it.

Getting capacity to actually reach the runtime surfaced a second, more serious bug at the mailbox send seam: the -1 status code returned by a local send was overloaded to mean both "message dropped per declared policy" (silent, by spec) and "genuine send failure" (actor gone, OOM, foreign-runtime rejection). A first pass at fixing this narrowed the send-terminator's trap condition too broadly and ended up swallowing real failures along with policy-drops. The actual fix pushes the distinction down to where it belongs — hew_mailbox_send's outcome is no longer collapsed before it reaches the caller, so a policy-drop (drop_new/drop_old) resolves to Ok while every genuine failure (actor stopped, OOM, foreign-runtime boundary, fail-policy overflow) keeps a distinct, nonzero code and the send terminator traps on it as before.

Verification

  • Red-before/green-after flood tests bound mailbox delivery: direct-spawn actor floods 20 sends into a capacity-4 mailbox (20 delivered before the fix, 4 after); supervised child floods 20 into a capacity-5 mailbox (20 before, 5 after).
  • A fire-and-forget send to a dead/stopped actor now traps instead of silently succeeding.
  • overflow fail on an over-capacity mailbox now traps instead of silently dropping.
  • overflow coalesce(...) emits a clear not-yet-implemented diagnostic (E_NOT_YET_IMPLEMENTED) instead of compiling and silently discarding the policy.
  • cargo test -p hew-mir, -p hew-codegen-rs, -p hew-runtime all green; cargo clippy clean across the touched crates.
  • Validated in a full integrated preflight run alongside other in-flight work; formatting and orchestration-leak lint gates passed on push.

Out of scope

  • The coalesce overflow policy's key-function ABI itself is not implemented — it fails closed with a diagnostic rather than compiling to incorrect behaviour. Left for a follow-up.

slepp added 5 commits July 3, 2026 18:52
…h actor lowering

A declared `mailbox N overflow <policy>;` clause on an actor was silently
discarded between HIR and MIR: `ActorLayout`, `SupervisorChildLayout`, and
`Instr::SpawnActor` had no fields for it, so codegen could never see the
declared capacity or policy — every actor mailbox stayed unbounded
regardless of what the source declared. The runtime side already honours
this correctly (`hew_actor_spawn_opts`, `parse_overflow_policy`, the
supervisor spec-copy paths); only the compiler-side plumbing was missing.

- Add `mailbox_capacity: Option<u32>` and
  `overflow_policy: Option<hew_parser::ast::OverflowPolicy>` to
  `ActorLayout`, `SupervisorChildLayout`, and `Instr::SpawnActor`.
- Populate the fields at the `ActorLayout` literal from
  `actor.mailbox_capacity`/`overflow_policy`, mirror them into
  `SupervisorChildLayout` in the post-loop pass, and copy layout to instr
  in `lower_spawn_actor` — the same three hunks that already thread
  `max_heap_bytes`/`cycle_capable`.
- Fail closed on `overflow coalesce(key)`: the coalesce key-function ABI
  slice does not exist in codegen yet, so threading a bare `Coalesce` tag
  would silently miscarry the declared policy. Add
  `MirDiagnosticKind::MailboxOverflowCoalesceNotYetImplemented` and wire it
  through `hew-cli`'s diagnostic rendering (prefix, kind name, primary
  site, message, and notes) instead of remapping to another policy.
- Extend the `spawn_actor` MIR dump renderer with
  `mailbox_capacity=N overflow=<Policy>` so `--dump-mir` spot-checks the
  new fields.

`mailbox N;` with no explicit `overflow` clause still means the spec §6.2
default (`block`) — that mapping lands in the codegen commit alongside the
by-name `HewOverflowPolicy` translation.
Two codegen sites hardcoded HewActorOpts/HewChildSpec mailbox fields to
zero, discarding the capacity and policy now threaded through MIR
(previous commit) and leaving every actor mailbox unbounded regardless of
its declared "mailbox N overflow <policy>;" clause:

- Direct spawn (emit_spawn_actor): the hew_actor_spawn_opts path was
  gated on max_heap_bytes.is_some() || cycle_capable, so a mailbox-only
  actor skipped the opts struct entirely, and fields 3/4
  (mailbox_capacity/overflow) were const_zero even when the gate did
  fire. Widen the gate to include mailbox_capacity.is_some() and store
  the real values.
- Supervised children: the HewChildSpec literal stored const_zero at
  fields 5/6 unconditionally, so every supervised actor stayed unbounded
  across (re)spawns even after the direct-spawn path was fixed.

Both sites now route through one overflow_policy_to_i32 helper that maps
hew_parser::ast::OverflowPolicy to the runtime's HewOverflowPolicy
#[repr(i32)] encoding BY NAME -- the two enums declare their variants in a
different order (Block=0 in the runtime vs. DropNew-first in the
parser), so an ordinal cast would silently wire the wrong policy. None
maps to Block (spec section 6.2's declared-default), and Coalesce fails
closed with CodegenError::FailClosed as a belt-and-brace backstop behind
the MIR-side NYI diagnostic.

Fixing the zero-hardcode alone was not sufficient: a genuinely bounded
mailbox now reaches the Terminator::Send overflow path for the first
time (previously dead code, since capacity was always silently 0), and
that path traps the SENDER on ANY nonzero hew_mailbox_send status --
including ErrMailboxFull, the status shared by drop_new/drop_old
overflow. Per spec section 6.2 those policies are silent ("New message is
silently discarded" / "Oldest message ... is evicted"), so trapping the
sender would make a correctly-bounded mailbox worse than the previous
unbounded default. Narrow the trap condition to exclude ErrMailboxFull
specifically; genuine failures (actor gone, OOM, foreign runtime, remote
rejection) still trap exactly as before -- the existing
send_terminator_checks_return_status_and_traps_on_failure regression test
is unaffected.

Adds tests/vertical-slice/accept/mailbox_bounded_drop_new(_supervised).hew
(direct + supervised flood fixtures, wired into run.sh with an MIR-dump
capacity check) and
tests/vertical-slice/reject/mailbox_overflow_coalesce_nyi.hew (fail-closed
NYI fixture). Test-literal updates in hew-codegen-rs/tests/* are
mechanical fallout from the new ActorLayout/Instr::SpawnActor fields.
…e mailbox seam

The `Terminator::Send` trap condition excluded status `-1`
(`HewError::ErrMailboxFull`) to keep a declared bounded mailbox's
`drop_new`/`drop_old` overflow silent, as spec'd. But `-1` was overloaded:
`hew_actor_send_by_id` also returned it for an actor that is no longer
tracked locally (freed, stopped, or gone), so excluding it silently
swallowed that genuine failure too. A fire-and-forget send to a
permanently-stopped actor no longer trapped.

Move the policy-drop/genuine-failure distinction to where it belongs, below
`hew_actor_send_by_id`:

- `hew_mailbox_send_fire_and_forget` (new, mailbox.rs) resolves the raw send
  outcome for the no-reply-channel path only: a declared overflow
  policy-drop reports success (`0`), while every genuine failure (`fail`
  policy, actor gone, OOM) keeps its own distinct non-zero code. The
  ask/reply-channel path is untouched — an ask that silently dropped its
  message would leave the caller's reply channel waiting forever.
- `hew_actor_send_by_id` now propagates the actual status code instead of
  collapsing to a bool, and reports `ErrActorStopped` (not a bare `-1`) when
  the actor is not tracked locally.
- `Terminator::Send` codegen reverts to trapping on any non-zero status,
  since a policy-drop now arrives as `0`.

As a result, `overflow fail` also becomes fail-closed on a fire-and-forget
send: it previously fell into the same silently-dropped bucket as
`drop_new`/`drop_old`; it now returns a genuine failure code and traps.

Also drops an internal reference in a code comment that doesn't belong in
committed source.
… fence

Reword the mailbox send-failure test comments to plain invariants, and mark the coalesce doctest fence as not-yet-implemented since that policy now fails closed.
Remove G-A1 shorthand from comments, rephrasing to plain technical invariants:
- hew-mir/src/lower.rs line 1740
- hew-mir/src/model.rs line 6073
- tests/vertical-slice/accept/mailbox_bounded_drop_new.hew line 1
- tests/vertical-slice/accept/mailbox_bounded_drop_new_supervised.hew line 1
- tests/vertical-slice/reject/mailbox_overflow_coalesce_nyi.hew line 1
- tests/vertical-slice/run.sh lines 1481, 1495, 1501

Comment prose only — no code or test logic changed.
@slepp slepp enabled auto-merge (squash) July 4, 2026 03:45
@slepp slepp merged commit f95d0fc into main Jul 4, 2026
12 checks passed
@slepp slepp deleted the fix/mailbox-capacity branch July 4, 2026 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant