Summary
This is the fundamental follow-up beyond #277 and #281.
#277 moved dispatch authority out of DispatchContext / DispatchState and onto machine-owned VM state
#281 tracks the remaining immediate compromise around dispatch-root reachability
This issue is broader:
Redesign continuations so they carry enough machine-owned control state to resume as first-class suspended computations, aligned as closely as practical with the OCaml 5 model.
The point is not merely to delete one map or add better resume patches.
The point is to eliminate the architectural reason ad hoc resume reconstruction was needed in the first place.
Context
The runtime is already much cleaner than before:
DispatchContext is gone as runtime authority
DispatchState is gone
- dispatch is modeled by machine-owned
Frame::Dispatch(...)
- transient
HandlerDispatch wrapper frames are gone
- handler completion now routes through an explicit handler boundary frame
- traceback repair heuristics were reduced by moving more truth into Rust-side trace emission
- the suite has reached green states repeatedly during this work
Current core validation baseline remains:
make sync
cargo check -q -p doeff-vm --manifest-path packages/doeff-vm/Cargo.toml
uv run pytest -q
Clarification: The Real Remaining Problem
The remaining problem is not just where a lookup map lives.
It is how the VM represents a suspended computation.
The current design still leans too much toward:
- continuation = execution snapshot (
frames_snapshot, mode, pending python, scope store)
- outer handler/interceptor structure = reconstructed later from surrounding machine state
That split is what keeps producing edge cases like:
- resumed computation losing the outer cache handler
CachePutEffect becoming unhandled after an Await
- resumed code observing the wrong handler/interceptor topology
- traceback needing compensating logic
The correct target is:
a continuation should be a first-class suspended computation object, carrying enough structural control context that resume is reactivation, not heuristic reconstruction.
Why The Existing “Topology Reinstall” Framing Is Not Final
Saying “reinstall the topology” is only acceptable as an intermediate description.
As a final architecture target, it sounds like a patch layered on top of an insufficient continuation representation.
The target should be stated more precisely:
- continuation owns both execution-state snapshot and structural-boundary snapshot
- resume reactivates that object into machine state
- shared mutable state remains outside the continuation, in handler-local / runtime store
- resume does not guess or repair outer handler/interceptor structure from incidental caller state
Correct Architecture Target
Continuation model
A continuation must carry two kinds of machine-owned state:
- Execution snapshot
- frames / resumable control stack
- mode
- pending python state
- scope store
- marker / caller linkage needed for re-entry
- Structural boundary snapshot
- prompt boundaries (
WithHandler installations)
- interceptor boundaries (
WithIntercept installations)
- enough identity to preserve which handler/interceptor installations are visible after resume
This must apply to both:
- started continuations captured from running code
- unstarted continuations created explicitly
Shared mutable state
The continuation must not clone or own shared mutable runtime state.
That state stays external and shared, for example:
- handler-local store cells
- scheduler state
- semaphore state
- promise tables
- cache handler state
From the VM’s perspective, these remain opaque mutable store-backed state.
The continuation must only preserve enough control/topology information to reach and use them again after resume.
Live authority
Live authority for installed handler scope remains:
PromptBoundary.handler_scope_id
That remains the runtime source of truth for active scope.
Any continuation-carried scope metadata is snapshot metadata, not live authority.
What This Means Operationally
Resume semantics
Resume(k, value) / Transfer(k, value) should be defined as:
- reactivate a suspended computation object
- re-enter it under the same structural handler/interceptor envelope it had when captured
- continue stepping from there
Not as:
- materialize a bare execution segment from frames only
- then try to reconstruct the missing outer structure from current caller state
Consequence
The final architecture should not rely on ad hoc “reinstall helper” logic as the conceptual model.
Implementation may still temporarily build segments/prompts during activation, but the design target is:
- continuation object already owns the structure that is being reactivated
- no semantic dependence on caller-chain guesswork or heuristic reattachment
Concrete Data Model To Aim At
The exact names may differ, but the direction should be explicit.
Continuation-owned structural entries
Conceptually, started and unstarted continuations should both be able to carry a chain like:
enum ReinstallChainEntry {
Handler(HandlerInstallSpec),
Interceptor(InterceptorInstallSpec),
}
struct HandlerInstallSpec {
handler: KleisliRef,
identity: Option<PyShared>,
handler_scope_id: Option<HandlerScopeId>,
types: Option<Vec<PyShared>>,
}
struct InterceptorInstallSpec {
interceptor: KleisliRef,
types: Option<Vec<PyShared>>,
mode: InterceptMode,
metadata: Option<CallMetadata>,
}
This is not meant as a long-term “patch helper”; it is the structural part of the continuation.
Ownership split still matters
The owner/visible dispatch split discovered earlier is still valid:
- bookkeeping owner
- resumed-into / visible dispatch affiliation
But that split belongs inside a stronger continuation representation, not alongside a continuation that only partially knows its own topology.
Scheduler Implication
Scheduler remains user-space handler logic.
It must not become the semantic owner of dispatch/continuation rules.
The continuation model itself must be strong enough that scheduler can simply transport suspended computations.
That means:
- no scheduler-specific reattachment hacks
- no special VM-side lookup state for outer handler recovery
- no need for scheduler to reconstruct missing handler topology
Await Implication
The final sync Await design should follow the same rule:
- sync
Await runtime is Rust-owned and handler-local
- continuation/object reactivation must preserve the outer handler/interceptor structure naturally
- resumed code after
Await must still see outer handlers such as cache handlers and interceptors without special post-hoc repair
Traceback / Trace Implication
Traceback should become a pure projection of runtime truth.
That requires:
- handler identity in trace payloads should come from runtime-owned stable identity (for example
handler_scope_id)
- active-chain suppression / stack attach should happen during active-chain assembly, not in the formatter
doeff/traceback.py should become render-only
Exact Current Code Areas
Core runtime:
packages/doeff-vm-core/src/continuation.rs
packages/doeff-vm-core/src/vm.rs
packages/doeff-vm-core/src/segment.rs
packages/doeff-vm-core/src/dispatch.rs
packages/doeff-vm-core/src/trace_state.rs
Scheduler / transport:
packages/doeff-core-effects/src/scheduler/mod.rs
Await:
packages/doeff-core-effects/src/handlers/mod.rs
doeff/handlers/await_handlers.py
What Has Already Been Learned From Experiments
Experiments have already established that:
- pure caller-chain recovery is insufficient
- a bare execution snapshot is insufficient for started continuation resume
- nested dispatch exposes that bookkeeping owner and resumed-into topology are not the same concept
- simply copying a few scope ids is not enough
- the remaining failures are strongly tied to continuation topology, not just mutable state placement
Those experiments should inform the redesign instead of being repeated blindly.
Validation Targets
The main regression detectors remain:
tests/effects/test_finally_semaphore_over_release.py
tests/effects/test_effect_combinations.py
tests/test_dispatch_completion.py
tests/core/test_runtime_regressions_manual.py
tests/core/test_traceback_format_default.py
tests/core/test_spec_trace_001_examples.py
tests/core/test_traceback_spec_compliance.py
uv run pytest -q
Acceptance Criteria
- A continuation is explicitly treated as a first-class suspended computation object, not merely a frame snapshot that requires heuristic outer-structure repair
- Started continuation resume preserves outer handler/interceptor topology without relying on ad hoc caller-chain reconstruction
- Shared mutable runtime state remains external and store-backed; it is not cloned into continuations
- Live scope authority remains
PromptBoundary.handler_scope_id
- Traceback becomes a pure projection of runtime truth; formatter-side repair is gone
make sync, cargo check -q -p doeff-vm --manifest-path packages/doeff-vm/Cargo.toml, and uv run pytest -q all stay green
Relationship To Other Issues
#277: primary dispatch architecture rewrite
#281: immediate follow-up to remove remaining dispatch-root lookup compromise
This issue is the architectural parent of both the remaining started-continuation and traceback work.
If #281 is “delete the last fallback map,” this issue is now more precisely:
make continuations structurally complete enough that resume is reactivation of suspended computation state, not heuristic reconstruction of lost topology.
Summary
This is the fundamental follow-up beyond
#277and#281.#277moved dispatch authority out ofDispatchContext/DispatchStateand onto machine-owned VM state#281tracks the remaining immediate compromise around dispatch-root reachabilityThis issue is broader:
The point is not merely to delete one map or add better resume patches.
The point is to eliminate the architectural reason ad hoc resume reconstruction was needed in the first place.
Context
The runtime is already much cleaner than before:
DispatchContextis gone as runtime authorityDispatchStateis goneFrame::Dispatch(...)HandlerDispatchwrapper frames are goneCurrent core validation baseline remains:
make synccargo check -q -p doeff-vm --manifest-path packages/doeff-vm/Cargo.tomluv run pytest -qClarification: The Real Remaining Problem
The remaining problem is not just where a lookup map lives.
It is how the VM represents a suspended computation.
The current design still leans too much toward:
frames_snapshot, mode, pending python, scope store)That split is what keeps producing edge cases like:
CachePutEffectbecoming unhandled after anAwaitThe correct target is:
a continuation should be a first-class suspended computation object, carrying enough structural control context that resume is reactivation, not heuristic reconstruction.
Why The Existing “Topology Reinstall” Framing Is Not Final
Saying “reinstall the topology” is only acceptable as an intermediate description.
As a final architecture target, it sounds like a patch layered on top of an insufficient continuation representation.
The target should be stated more precisely:
Correct Architecture Target
Continuation model
A continuation must carry two kinds of machine-owned state:
WithHandlerinstallations)WithInterceptinstallations)This must apply to both:
Shared mutable state
The continuation must not clone or own shared mutable runtime state.
That state stays external and shared, for example:
From the VM’s perspective, these remain opaque mutable store-backed state.
The continuation must only preserve enough control/topology information to reach and use them again after resume.
Live authority
Live authority for installed handler scope remains:
PromptBoundary.handler_scope_idThat remains the runtime source of truth for active scope.
Any continuation-carried scope metadata is snapshot metadata, not live authority.
What This Means Operationally
Resume semantics
Resume(k, value)/Transfer(k, value)should be defined as:Not as:
Consequence
The final architecture should not rely on ad hoc “reinstall helper” logic as the conceptual model.
Implementation may still temporarily build segments/prompts during activation, but the design target is:
Concrete Data Model To Aim At
The exact names may differ, but the direction should be explicit.
Continuation-owned structural entries
Conceptually, started and unstarted continuations should both be able to carry a chain like:
This is not meant as a long-term “patch helper”; it is the structural part of the continuation.
Ownership split still matters
The owner/visible dispatch split discovered earlier is still valid:
But that split belongs inside a stronger continuation representation, not alongside a continuation that only partially knows its own topology.
Scheduler Implication
Scheduler remains user-space handler logic.
It must not become the semantic owner of dispatch/continuation rules.
The continuation model itself must be strong enough that scheduler can simply transport suspended computations.
That means:
AwaitImplicationThe final sync
Awaitdesign should follow the same rule:Awaitruntime is Rust-owned and handler-localAwaitmust still see outer handlers such as cache handlers and interceptors without special post-hoc repairTraceback / Trace Implication
Traceback should become a pure projection of runtime truth.
That requires:
handler_scope_id)doeff/traceback.pyshould become render-onlyExact Current Code Areas
Core runtime:
packages/doeff-vm-core/src/continuation.rspackages/doeff-vm-core/src/vm.rspackages/doeff-vm-core/src/segment.rspackages/doeff-vm-core/src/dispatch.rspackages/doeff-vm-core/src/trace_state.rsScheduler / transport:
packages/doeff-core-effects/src/scheduler/mod.rsAwait:
packages/doeff-core-effects/src/handlers/mod.rsdoeff/handlers/await_handlers.pyWhat Has Already Been Learned From Experiments
Experiments have already established that:
Those experiments should inform the redesign instead of being repeated blindly.
Validation Targets
The main regression detectors remain:
tests/effects/test_finally_semaphore_over_release.pytests/effects/test_effect_combinations.pytests/test_dispatch_completion.pytests/core/test_runtime_regressions_manual.pytests/core/test_traceback_format_default.pytests/core/test_spec_trace_001_examples.pytests/core/test_traceback_spec_compliance.pyuv run pytest -qAcceptance Criteria
PromptBoundary.handler_scope_idmake sync,cargo check -q -p doeff-vm --manifest-path packages/doeff-vm/Cargo.toml, anduv run pytest -qall stay greenRelationship To Other Issues
#277: primary dispatch architecture rewrite#281: immediate follow-up to remove remaining dispatch-root lookup compromiseThis issue is the architectural parent of both the remaining started-continuation and traceback work.
If
#281is “delete the last fallback map,” this issue is now more precisely: