OCaml5-aligned redesign: make continuation/task ownership carry dispatch roots

## Summary

This is the fundamental follow-up beyond `#277` and `#281`.

- `#277` moved dispatch authority out of `DispatchContext` / `DispatchState` and onto machine-owned VM state
- `#281` tracks the remaining immediate compromise around dispatch-root reachability

This issue is broader:

> Redesign continuations so they carry enough machine-owned control state to resume as first-class suspended computations, aligned as closely as practical with the OCaml 5 model.

The point is **not** merely to delete one map or add better resume patches.
The point is to eliminate the architectural reason ad hoc resume reconstruction was needed in the first place.

## Context

The runtime is already much cleaner than before:
- `DispatchContext` is gone as runtime authority
- `DispatchState` is gone
- dispatch is modeled by machine-owned `Frame::Dispatch(...)`
- transient `HandlerDispatch` wrapper frames are gone
- handler completion now routes through an explicit handler boundary frame
- traceback repair heuristics were reduced by moving more truth into Rust-side trace emission
- the suite has reached green states repeatedly during this work

Current core validation baseline remains:
- `make sync`
- `cargo check -q -p doeff-vm --manifest-path packages/doeff-vm/Cargo.toml`
- `uv run pytest -q`

## Clarification: The Real Remaining Problem

The remaining problem is **not** just where a lookup map lives.
It is **how the VM represents a suspended computation**.

The current design still leans too much toward:
- continuation = execution snapshot (`frames_snapshot`, mode, pending python, scope store)
- outer handler/interceptor structure = reconstructed later from surrounding machine state

That split is what keeps producing edge cases like:
- resumed computation losing the outer cache handler
- `CachePutEffect` becoming unhandled after an `Await`
- resumed code observing the wrong handler/interceptor topology
- traceback needing compensating logic

The correct target is:

**a continuation should be a first-class suspended computation object, carrying enough structural control context that resume is reactivation, not heuristic reconstruction.**

## Why The Existing “Topology Reinstall” Framing Is Not Final

Saying “reinstall the topology” is only acceptable as an intermediate description.
As a final architecture target, it sounds like a patch layered on top of an insufficient continuation representation.

The target should be stated more precisely:
- continuation owns both execution-state snapshot and structural-boundary snapshot
- resume reactivates that object into machine state
- shared mutable state remains outside the continuation, in handler-local / runtime store
- resume does **not** guess or repair outer handler/interceptor structure from incidental caller state

## Correct Architecture Target

### Continuation model
A continuation must carry two kinds of machine-owned state:

1. **Execution snapshot**
- frames / resumable control stack
- mode
- pending python state
- scope store
- marker / caller linkage needed for re-entry

2. **Structural boundary snapshot**
- prompt boundaries (`WithHandler` installations)
- interceptor boundaries (`WithIntercept` installations)
- enough identity to preserve which handler/interceptor installations are visible after resume

This must apply to both:
- started continuations captured from running code
- unstarted continuations created explicitly

### Shared mutable state
The continuation must **not** clone or own shared mutable runtime state.
That state stays external and shared, for example:
- handler-local store cells
- scheduler state
- semaphore state
- promise tables
- cache handler state

From the VM’s perspective, these remain opaque mutable store-backed state.
The continuation must only preserve enough control/topology information to reach and use them again after resume.

### Live authority
Live authority for installed handler scope remains:
- `PromptBoundary.handler_scope_id`

That remains the runtime source of truth for active scope.
Any continuation-carried scope metadata is snapshot metadata, not live authority.

## What This Means Operationally

### Resume semantics
`Resume(k, value)` / `Transfer(k, value)` should be defined as:
- reactivate a suspended computation object
- re-enter it under the same structural handler/interceptor envelope it had when captured
- continue stepping from there

Not as:
- materialize a bare execution segment from frames only
- then try to reconstruct the missing outer structure from current caller state

### Consequence
The final architecture should not rely on ad hoc “reinstall helper” logic as the conceptual model.
Implementation may still temporarily build segments/prompts during activation, but the design target is:
- continuation object already owns the structure that is being reactivated
- no semantic dependence on caller-chain guesswork or heuristic reattachment

## Concrete Data Model To Aim At

The exact names may differ, but the direction should be explicit.

### Continuation-owned structural entries
Conceptually, started and unstarted continuations should both be able to carry a chain like:

```rust
enum ReinstallChainEntry {
    Handler(HandlerInstallSpec),
    Interceptor(InterceptorInstallSpec),
}

struct HandlerInstallSpec {
    handler: KleisliRef,
    identity: Option<PyShared>,
    handler_scope_id: Option<HandlerScopeId>,
    types: Option<Vec<PyShared>>,
}

struct InterceptorInstallSpec {
    interceptor: KleisliRef,
    types: Option<Vec<PyShared>>,
    mode: InterceptMode,
    metadata: Option<CallMetadata>,
}
```

This is not meant as a long-term “patch helper”; it is the structural part of the continuation.

### Ownership split still matters
The owner/visible dispatch split discovered earlier is still valid:
- bookkeeping owner
- resumed-into / visible dispatch affiliation

But that split belongs inside a stronger continuation representation, not alongside a continuation that only partially knows its own topology.

## Scheduler Implication
Scheduler remains user-space handler logic.
It must not become the semantic owner of dispatch/continuation rules.

The continuation model itself must be strong enough that scheduler can simply transport suspended computations.
That means:
- no scheduler-specific reattachment hacks
- no special VM-side lookup state for outer handler recovery
- no need for scheduler to reconstruct missing handler topology

## `Await` Implication
The final sync `Await` design should follow the same rule:
- sync `Await` runtime is Rust-owned and handler-local
- continuation/object reactivation must preserve the outer handler/interceptor structure naturally
- resumed code after `Await` must still see outer handlers such as cache handlers and interceptors without special post-hoc repair

## Traceback / Trace Implication
Traceback should become a pure projection of runtime truth.
That requires:
- handler identity in trace payloads should come from runtime-owned stable identity (for example `handler_scope_id`)
- active-chain suppression / stack attach should happen during active-chain assembly, not in the formatter
- `doeff/traceback.py` should become render-only

## Exact Current Code Areas

Core runtime:
- `packages/doeff-vm-core/src/continuation.rs`
- `packages/doeff-vm-core/src/vm.rs`
- `packages/doeff-vm-core/src/segment.rs`
- `packages/doeff-vm-core/src/dispatch.rs`
- `packages/doeff-vm-core/src/trace_state.rs`

Scheduler / transport:
- `packages/doeff-core-effects/src/scheduler/mod.rs`

Await:
- `packages/doeff-core-effects/src/handlers/mod.rs`
- `doeff/handlers/await_handlers.py`

## What Has Already Been Learned From Experiments

Experiments have already established that:
- pure caller-chain recovery is insufficient
- a bare execution snapshot is insufficient for started continuation resume
- nested dispatch exposes that bookkeeping owner and resumed-into topology are not the same concept
- simply copying a few scope ids is not enough
- the remaining failures are strongly tied to continuation topology, not just mutable state placement

Those experiments should inform the redesign instead of being repeated blindly.

## Validation Targets

The main regression detectors remain:
- `tests/effects/test_finally_semaphore_over_release.py`
- `tests/effects/test_effect_combinations.py`
- `tests/test_dispatch_completion.py`
- `tests/core/test_runtime_regressions_manual.py`
- `tests/core/test_traceback_format_default.py`
- `tests/core/test_spec_trace_001_examples.py`
- `tests/core/test_traceback_spec_compliance.py`
- `uv run pytest -q`

## Acceptance Criteria

1. A continuation is explicitly treated as a first-class suspended computation object, not merely a frame snapshot that requires heuristic outer-structure repair
2. Started continuation resume preserves outer handler/interceptor topology without relying on ad hoc caller-chain reconstruction
3. Shared mutable runtime state remains external and store-backed; it is not cloned into continuations
4. Live scope authority remains `PromptBoundary.handler_scope_id`
5. Traceback becomes a pure projection of runtime truth; formatter-side repair is gone
6. `make sync`, `cargo check -q -p doeff-vm --manifest-path packages/doeff-vm/Cargo.toml`, and `uv run pytest -q` all stay green

## Relationship To Other Issues

- `#277`: primary dispatch architecture rewrite
- `#281`: immediate follow-up to remove remaining dispatch-root lookup compromise

This issue is the architectural parent of both the remaining started-continuation and traceback work.
If `#281` is “delete the last fallback map,” this issue is now more precisely:

> make continuations structurally complete enough that resume is reactivation of suspended computation state, not heuristic reconstruction of lost topology.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCaml5-aligned redesign: make continuation/task ownership carry dispatch roots #282

Summary

Context

Clarification: The Real Remaining Problem

Why The Existing “Topology Reinstall” Framing Is Not Final

Correct Architecture Target

Continuation model

Shared mutable state

Live authority

What This Means Operationally

Resume semantics

Consequence

Concrete Data Model To Aim At

Continuation-owned structural entries

Ownership split still matters

Scheduler Implication

`Await` Implication

Traceback / Trace Implication

Exact Current Code Areas

What Has Already Been Learned From Experiments

Validation Targets

Acceptance Criteria

Relationship To Other Issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

OCaml5-aligned redesign: make continuation/task ownership carry dispatch roots #282

Description

Summary

Context

Clarification: The Real Remaining Problem

Why The Existing “Topology Reinstall” Framing Is Not Final

Correct Architecture Target

Continuation model

Shared mutable state

Live authority

What This Means Operationally

Resume semantics

Consequence

Concrete Data Model To Aim At

Continuation-owned structural entries

Ownership split still matters

Scheduler Implication

Await Implication

Traceback / Trace Implication

Exact Current Code Areas

What Has Already Been Learned From Experiments

Validation Targets

Acceptance Criteria

Relationship To Other Issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`Await` Implication