Reduce runtime latency without degrading deliberation quality, and add first-class observability for scheduling/runtime decisions
Summary
Improve MoralStack’s end-to-end latency by eliminating unnecessary waits, removing duplicated retrieval work, and making orchestration more runtime-aware, without weakening the current deliberative quality.
At the same time, introduce a proper observability model for orchestration decisions so the request detail view can fully explain what happened, why it happened, and what was skipped/reused/cancelled during execution.
This work must be implemented as a sequence of safe, incremental changes and must preserve the current governance semantics.
Motivation
MoralStack is already mature and includes useful latency-oriented primitives such as speculative generation and parallel module execution. However, the current runtime still has important inefficiencies:
- speculative generation is started in parallel with risk estimation, but the controller still waits for both results even when the speculative draft will never be used;
- the deliberative path supports parallel evaluation, but does not always short-circuit expensive modules when the critic has already established a hard violation;
- constitution/relevant-principle retrieval can be repeated multiple times within a single request;
- the domain prefilter cache is invalidated too aggressively;
- some runtime decisions that materially affect latency are not modeled explicitly in persistence or UI, which makes the request detail page incomplete from an audit/debug perspective.
This issue addresses both sides of the problem:
- Performance: reduce avoidable latency without reducing deliberative rigor.
- Auditability: make runtime orchestration decisions first-class, persisted, queryable, and visible in the UI.
Goals
Performance goals
- reduce wall-clock latency by eliminating avoidable blocking;
- reduce duplicated constitution/retrieval work within the same request;
- reduce parse-related retry/jitter in risk and retrieval paths;
- prepare the architecture for future multi-turn conversational governance.
Observability goals
- persist structured runtime/orchestration decisions separately from raw LLM calls;
- make scheduling/gating/reuse decisions visible in the request detail page;
- preserve or improve the auditability of the current pipeline;
- keep markdown/report export aligned with the UI.
Non-goals
This issue does not aim to:
- weaken the deliberative process;
- reduce the number of modules by default;
- blindly reduce the number of deliberation cycles;
- shorten prompts in a way that harms reasoning quality;
- replace structured runtime observability with generic debug logs.
Verified evidence from the current codebase
The following findings were verified against the current codebase and drive this plan:
-
Speculative overlap exists but is not truly lazy
moralstack/orchestration/controller.py
_run_speculative_overlap() currently waits for both risk estimation and speculative generation before returning.
- Result: in
REFUSE / DOMAIN_EXCLUDED style paths, we still pay avoidable waiting time.
-
Full parallel evaluation exists but does not provide true latency short-circuiting
moralstack/orchestration/deliberation_runner.py
- Both
_run_full_parallel_evaluation() and _run_critic_gated_parallel() already exist.
- Result: the runtime already has the right primitives, but the scheduler is not yet fully risk-aware.
-
Relevant-principles retrieval is duplicated
moralstack/orchestration/deliberation_runner.py
moralstack/runtime/modules/critic_module.py
- Result: the same request can trigger repeated
get_relevant_principles() work across runner + critic.
-
DomainPrefilter cache invalidation is too aggressive
moralstack/constitution/retriever.py
set_domain_keywords() clears cache even when the keyword map has not actually changed.
- Result: cache hit rate is artificially reduced.
-
Risk/retrieval stack still has room for structured-output hardening
- cognitive modules are already more structured;
- risk estimator and retrieval path still have room to reduce parse-related retries/jitter.
-
Current persistence/UI are not enough for new runtime decisions
moralstack/persistence/db.py
moralstack/persistence/sink.py
moralstack/reports/...
moralstack/ui/app.py
moralstack/ui/templates/request.html
- Result: LLM calls, decision traces, and debug events exist, but runtime decisions such as “speculative discarded”, “simulator skipped”, “critic short-circuit”, or “retrieval reused” are not yet modeled as first-class entities.
High-level implementation strategy
This work should be implemented in phases. The ordering is important.
Phase 0 — Build observability first
Before introducing new latency optimizations, add the persistence/report/UI substrate needed to explain those optimizations.
Phase 1 — Apply low-risk, high-ROI fixes
- fix unnecessary cache invalidation;
- reduce object/client recreation overhead;
- improve structured outputs in risk/retrieval.
Phase 2 — Remove avoidable blocking
- make speculative overlap truly lazy;
- stop waiting for speculative results when they are not needed.
Phase 3 — Eliminate duplicated retrieval work
- retrieve relevant principles once per request path and reuse downstream.
Phase 4 — Introduce runtime-aware scheduling
- choose between
critic_gated and full_parallel based on risk posture;
- make simulator gating and cancellation decisions explicit and observable.
Phase 5 — Add conservative early convergence
- allow cycle-1 convergence only when module outputs are strongly aligned.
Phase 6 — Prepare the schema/runtime for future multi-turn governance
- add conversation-oriented persistence fields and state model foundations.
Scope
A. Observability foundation
A1. Add a first-class orchestration_events persistence layer
Introduce a new table dedicated to runtime/orchestration decisions that are not equivalent to raw LLM calls.
Proposed schema
Add a new orchestration_events table with fields such as:
id
run_id
request_id
cycle
stage
component
event_type
decision
status
sequence
started_at
duration_ms
reason_codes_json
inputs_json
outputs_json
payload_json
Why this is needed
debug_events are useful as raw logs, but they are too generic to drive a clear request detail UI. We need a queryable, stable runtime event model for:
- scheduler strategy selection;
- speculative start/join/use/discard;
- simulator gate decisions;
- retrieval reuse decisions;
- cache invalidation/hit/miss events;
- convergence decisions;
- module cancellation / short-circuit outcomes.
Files to update
moralstack/persistence/db.py
moralstack/persistence/sink.py
moralstack/persistence/__init__.py
- any relevant persistence helpers
A2. Extend llm_calls with runtime outcome metadata
Add minimal metadata to llm_calls so the UI can tell whether a call was:
- normal or speculative;
- used or discarded;
- cached/reused/skipped/cancelled.
New fields
call_kind (normal | speculative | synthetic)
call_outcome (used | discarded | skipped | cancelled | cached | none)
cache_status (hit | miss | reused | none)
related_event_id (optional)
Why this is needed
A speculative generation call should not look identical to a normal call in the UI.
Files to update
moralstack/persistence/db.py
moralstack/persistence/sink.py
- all code paths persisting
llm_calls
A3. Add new decision trace stages
Introduce explicit audit-grade trace stages for:
RISK_ASSESSMENT
REQUEST_ANALYSIS_CONTEXT
CYCLE_SUMMARY
RISK_ASSESSMENT should include
risk_score
risk_category
operational_risk
intent_to_harm
requested_instructions
intent_operational
risk_policy_action
estimation_mode
detected_domain
activated_signals
REQUEST_ANALYSIS_CONTEXT should include
relevant_principles
constitution_domain
retrieval_count
reuse_targets
prefilter_cache_status
parallel_retrieval
CYCLE_SUMMARY should include
cycle
scheduler_strategy
modules_planned
modules_executed
modules_skipped
modules_cancelled
critic_decision
violations_count
violated_hard
semantic_expected_harm
perspectives_weighted_approval
convergence_decision
convergence_reason
next_action
Files to update
moralstack/orchestration/controller.py
moralstack/orchestration/deliberation_runner.py
- trace/decision logging helpers
A4. Upgrade the request detail UI
The request detail page should expose runtime decisions clearly and not rely only on raw debug events.
Add new UI sections
-
Execution Strategy
- risk estimation mode
- speculative generation outcome
- scheduler strategy
- retrieval reuse summary
- simulator gating summary
- convergence summary
-
Runtime Decisions
- ordered table of orchestration events
- columns like:
- cycle
- stage
- component
- event
- decision
- reason
- duration
- status/badge
-
Cycle Cards
- one card per deliberation cycle
- strategy chosen
- modules planned/executed/skipped/cancelled
- critic outcome
- simulator outcome or gate reason
- perspectives outcome
- convergence decision
-
Semantic Timeline Enhancements
- speculative call badges
- cancelled modules
- skipped modules
- reused computations
- cache hits
Keep existing raw diagnostics
- retain
Debug Events / raw traces as advanced fallback sections;
- they should not remain the primary way to understand runtime behavior.
Files to update
moralstack/reports/model.py
moralstack/reports/orchestrator_observability.py
- add new report/view-model helpers as needed
moralstack/ui/app.py
moralstack/ui/templates/request.html
- related static assets if needed
B. High-ROI performance fixes
B1. Make DomainPrefilter.set_domain_keywords() idempotent
Problem
The prefilter cache is invalidated even when the keyword map has not changed.
Change
Only invalidate if the effective keyword configuration actually changed.
Expected benefit
Improved cache hit rate and lower repeated domain prefilter cost.
Required observability
Emit structured events such as:
DOMAIN_PREFILTER_CACHE_HIT
DOMAIN_PREFILTER_CACHE_MISS
DOMAIN_PREFILTER_CACHE_INVALIDATED
Files
moralstack/constitution/retriever.py
B2. Reuse OpenAI clients / policy objects where safe
Problem
The retrieval stack and some mini-estimator paths recreate clients/policy objects too often.
Change
Introduce safe client/policy pooling/reuse to avoid repeated initialization overhead.
Expected benefit
Lower infrastructure overhead and improved connection reuse.
Notes
This is mainly an efficiency change and does not need prominent request-detail visualization, but may be surfaced in advanced diagnostics or run-level summaries.
Files
moralstack/constitution/retriever.py
moralstack/models/risk/estimator.py
B3. Strengthen structured-output enforcement in risk/retrieval
Problem
Risk/retrieval still have avoidable parse-related retries/jitter.
Change
Use response_format={"type":"json_object"} where appropriate and update parsers/fallbacks accordingly.
Expected benefit
Fewer malformed responses, fewer retries, lower jitter.
Required observability
Persist or surface:
- parse status
- parse attempts
- response contract type
- fallback usage
Files
moralstack/models/risk/estimator.py
moralstack/constitution/retriever.py
C. Remove avoidable controller blocking
C1. Make speculative overlap truly lazy
Problem
Speculative generation is started in parallel with risk estimation, but the controller still blocks on the speculative result even when it will never be used.
Change
Refactor _run_speculative_overlap() so that it returns:
- the resolved risk estimation;
- a handle/future for speculative generation.
Only join the speculative future in branches that actually consume the draft.
Required events
SPECULATIVE_STARTED
SPECULATIVE_JOIN_REQUIRED
SPECULATIVE_JOIN_SKIPPED
SPECULATIVE_RESULT_USED
SPECULATIVE_RESULT_DISCARDED
Required llm_calls metadata
call_kind = speculative
call_outcome = used | discarded
Expected benefit
Avoidable blocking disappears in paths where speculative output is irrelevant.
Files
moralstack/orchestration/controller.py
- related persistence helpers
D. Eliminate duplicated request-level retrieval work
D1. Introduce a request-scoped analysis context
Change
Add a request-scoped context object (for example RequestAnalysisContext) carrying:
relevant_principles
constitution
detected_domain
- retrieval metadata
- prefilter cache status
Files
moralstack/orchestration/types.py
moralstack/orchestration/deliberation_runner.py
D2. Retrieve relevant principles once and reuse them downstream
Problem
Relevant-principles retrieval can happen multiple times for a single request.
Change
Perform retrieval once at the beginning of the deliberative path and pass the result to downstream consumers such as the critic.
Required events
RELEVANT_PRINCIPLES_RETRIEVED
RELEVANT_PRINCIPLES_REUSED
Required trace
Expected benefit
Reduced duplicated retrieval cost without changing policy semantics.
Files
moralstack/orchestration/deliberation_runner.py
moralstack/runtime/modules/critic_module.py
E. Runtime-aware scheduling in the deliberative cycle
E1. Dynamically choose critic_gated vs full_parallel
Problem
The code already supports both strategies, but the runtime is not yet choosing dynamically based on risk posture.
Change
Inside the deliberation runner, choose scheduling strategy based on the request’s posture, for example using signals like:
- high-risk category;
intent_to_harm = true;
operational_risk = HIGH;
- request for operational instructions in sensitive contexts;
- prior-cycle hard violations.
Required events
PARALLEL_STRATEGY_SELECTED
CRITIC_SHORT_CIRCUIT_TRIGGERED
PARALLEL_MODULE_CANCEL_ATTEMPTED
PARALLEL_MODULE_CANCELLED
PARALLEL_MODULE_COMPLETED_AFTER_SHORT_CIRCUIT
Required trace
CYCLE_SUMMARY.scheduler_strategy
Expected benefit
Avoid paying for expensive modules in cases where the critic can safely close the path early.
Files
moralstack/orchestration/deliberation_runner.py
E2. Make simulator gating explicit, conservative, and observable
Problem
Simulator gating exists, but it is not yet the default operating mode and its decisions are not first-class in observability.
Change
Enable simulator gating only after observability and tests are in place, and use conservative thresholds/logic such as:
- skip simulator when the critic returns
PROCEED with zero violations;
- skip simulator when previous-cycle expected harm is very low and the draft did not materially change.
Required events
SIMULATOR_GATE_DECISION
SIMULATOR_EXECUTED
SIMULATOR_SKIPPED
Required trace
- module skip/execution information in
CYCLE_SUMMARY
Expected benefit
Reduced cycle cost in clearly converged/clean cases without degrading governance quality.
Files
moralstack/orchestration/config_loader.py
moralstack/orchestration/deliberation_runner.py
F. Conservative early convergence
F1. Allow early cycle-1 convergence only when all signals strongly align
Change
Extend convergence logic so that cycle 1 can terminate early only when:
- critic is clean;
- expected harm is very low;
- perspectives are strongly aligned;
- no residual concern indicates the need for another cycle.
Required events
CONVERGENCE_EVALUATED
EARLY_CONVERGENCE_ACCEPTED
EARLY_CONVERGENCE_REJECTED
Required trace
CYCLE_SUMMARY.convergence_decision
CYCLE_SUMMARY.convergence_reason
Expected benefit
Avoid unnecessary second cycles in genuinely converged cases.
Files
moralstack/orchestration/convergence_evaluator.py
G. Multi-turn readiness
G1. Prepare persistence/schema for future conversational governance
Change
Add fields such as:
conversation_id
turn_index
parent_request_id
Future direction
This issue does not require full conversational governance yet, but the schema should be ready for it.
Files
moralstack/persistence/db.py
- request persistence model(s)
G2. Lay groundwork for a future ConversationGovernanceState
Change
Prepare an optional runtime state model that can later support:
- domain carry-forward;
- overlay reuse;
- prior posture carry-forward;
- future delta-based refreshes.
Note
This is preparatory and should come last.
Testing requirements
The implementation must include or update tests for:
Persistence
orchestration_events creation, insertion, and retrieval
llm_calls backward-compatible extension
Observability/reporting
- execution strategy view model
- runtime decisions view model
- cycle summary rendering
- markdown export consistency
UI
- request detail rendering of:
- speculative used/discarded
- scheduler strategy
- simulator skipped/executed
- convergence decisions
- retrieval reuse summary
Runtime behavior
- lazy speculative join
- single retrieval + downstream reuse
- dynamic scheduler selection
- simulator gating
- early convergence decisions
Acceptance criteria
This issue is complete only when all of the following are true:
- Latency optimizations are implemented without changing governance semantics.
- A first-class
orchestration_events layer exists and is persisted.
llm_calls can distinguish speculative/used/discarded/cancelled/skipped behavior.
- Request detail UI clearly shows execution strategy and runtime decisions.
- Cycle-level summaries explain what was executed, skipped, reused, or cancelled.
- Relevant-principles retrieval is performed once per request path and reused downstream.
- Speculative generation is no longer awaited in branches that do not consume it.
- Risk-aware scheduling between
critic_gated and full_parallel is implemented and observable.
- Simulator gating and early convergence are both conservative, tested, and visible in traces/UI.
- Existing reports and raw diagnostics continue to work.
Recommended implementation order
Block A — Observability before optimization
- add
orchestration_events
- extend
llm_calls
- add new decision trace stages
- update report/view-model layer
- update request detail UI
Block B — Low-risk, high-ROI fixes
- make
set_domain_keywords() idempotent
- reuse clients/policies
- strengthen structured outputs in risk/retrieval
Block C — Remove avoidable waits
- make speculative overlap truly lazy
- introduce request-scoped analysis context and retrieval reuse
Block D — Runtime-aware scheduling
- dynamic
critic_gated vs full_parallel
- observable simulator gating
- conservative early convergence
Block E — Future-proofing
- conversation-ready schema fields
- runtime foundations for future conversation state
Risks and cautions
- Do not implement multiple scheduler/gating changes at once without observability.
- Do not rely on raw debug events as the primary UI source.
- Do not reduce latency by weakening the deliberative process.
- Do not change thresholds and scheduling policy in the same commit unless strongly justified and well tested.
- Ensure backward compatibility for existing DB/report/UI paths wherever possible.
Expected outcome
After this issue is completed, MoralStack should provide:
Better latency
Through:
- fewer unnecessary waits;
- less duplicated retrieval work;
- smarter scheduling;
- fewer parse retries/jitter;
- better cache behavior.
Better auditability
Through:
- structured orchestration events;
- richer traces;
- semantically richer LLM call records;
- a request detail page that explains runtime decisions clearly.
Better readiness for multi-turn governance
Through:
- conversation-ready schema foundations;
- explicit separation between LLM calls, audit traces, and runtime orchestration decisions.
Reduce runtime latency without degrading deliberation quality, and add first-class observability for scheduling/runtime decisions
Summary
Improve MoralStack’s end-to-end latency by eliminating unnecessary waits, removing duplicated retrieval work, and making orchestration more runtime-aware, without weakening the current deliberative quality.
At the same time, introduce a proper observability model for orchestration decisions so the request detail view can fully explain what happened, why it happened, and what was skipped/reused/cancelled during execution.
This work must be implemented as a sequence of safe, incremental changes and must preserve the current governance semantics.
Motivation
MoralStack is already mature and includes useful latency-oriented primitives such as speculative generation and parallel module execution. However, the current runtime still has important inefficiencies:
This issue addresses both sides of the problem:
Goals
Performance goals
Observability goals
Non-goals
This issue does not aim to:
Verified evidence from the current codebase
The following findings were verified against the current codebase and drive this plan:
Speculative overlap exists but is not truly lazy
moralstack/orchestration/controller.py_run_speculative_overlap()currently waits for both risk estimation and speculative generation before returning.REFUSE/DOMAIN_EXCLUDEDstyle paths, we still pay avoidable waiting time.Full parallel evaluation exists but does not provide true latency short-circuiting
moralstack/orchestration/deliberation_runner.py_run_full_parallel_evaluation()and_run_critic_gated_parallel()already exist.Relevant-principles retrieval is duplicated
moralstack/orchestration/deliberation_runner.pymoralstack/runtime/modules/critic_module.pyget_relevant_principles()work across runner + critic.DomainPrefilter cache invalidation is too aggressive
moralstack/constitution/retriever.pyset_domain_keywords()clears cache even when the keyword map has not actually changed.Risk/retrieval stack still has room for structured-output hardening
Current persistence/UI are not enough for new runtime decisions
moralstack/persistence/db.pymoralstack/persistence/sink.pymoralstack/reports/...moralstack/ui/app.pymoralstack/ui/templates/request.htmlHigh-level implementation strategy
This work should be implemented in phases. The ordering is important.
Phase 0 — Build observability first
Before introducing new latency optimizations, add the persistence/report/UI substrate needed to explain those optimizations.
Phase 1 — Apply low-risk, high-ROI fixes
Phase 2 — Remove avoidable blocking
Phase 3 — Eliminate duplicated retrieval work
Phase 4 — Introduce runtime-aware scheduling
critic_gatedandfull_parallelbased on risk posture;Phase 5 — Add conservative early convergence
Phase 6 — Prepare the schema/runtime for future multi-turn governance
Scope
A. Observability foundation
A1. Add a first-class
orchestration_eventspersistence layerIntroduce a new table dedicated to runtime/orchestration decisions that are not equivalent to raw LLM calls.
Proposed schema
Add a new
orchestration_eventstable with fields such as:idrun_idrequest_idcyclestagecomponentevent_typedecisionstatussequencestarted_atduration_msreason_codes_jsoninputs_jsonoutputs_jsonpayload_jsonWhy this is needed
debug_eventsare useful as raw logs, but they are too generic to drive a clear request detail UI. We need a queryable, stable runtime event model for:Files to update
moralstack/persistence/db.pymoralstack/persistence/sink.pymoralstack/persistence/__init__.pyA2. Extend
llm_callswith runtime outcome metadataAdd minimal metadata to
llm_callsso the UI can tell whether a call was:New fields
call_kind(normal | speculative | synthetic)call_outcome(used | discarded | skipped | cancelled | cached | none)cache_status(hit | miss | reused | none)related_event_id(optional)Why this is needed
A speculative generation call should not look identical to a normal call in the UI.
Files to update
moralstack/persistence/db.pymoralstack/persistence/sink.pyllm_callsA3. Add new decision trace stages
Introduce explicit audit-grade trace stages for:
RISK_ASSESSMENTREQUEST_ANALYSIS_CONTEXTCYCLE_SUMMARYRISK_ASSESSMENTshould includerisk_scorerisk_categoryoperational_riskintent_to_harmrequested_instructionsintent_operationalrisk_policy_actionestimation_modedetected_domainactivated_signalsREQUEST_ANALYSIS_CONTEXTshould includerelevant_principlesconstitution_domainretrieval_countreuse_targetsprefilter_cache_statusparallel_retrievalCYCLE_SUMMARYshould includecyclescheduler_strategymodules_plannedmodules_executedmodules_skippedmodules_cancelledcritic_decisionviolations_countviolated_hardsemantic_expected_harmperspectives_weighted_approvalconvergence_decisionconvergence_reasonnext_actionFiles to update
moralstack/orchestration/controller.pymoralstack/orchestration/deliberation_runner.pyA4. Upgrade the request detail UI
The request detail page should expose runtime decisions clearly and not rely only on raw debug events.
Add new UI sections
Execution Strategy
Runtime Decisions
Cycle Cards
Semantic Timeline Enhancements
Keep existing raw diagnostics
Debug Events/ raw traces as advanced fallback sections;Files to update
moralstack/reports/model.pymoralstack/reports/orchestrator_observability.pymoralstack/ui/app.pymoralstack/ui/templates/request.htmlB. High-ROI performance fixes
B1. Make
DomainPrefilter.set_domain_keywords()idempotentProblem
The prefilter cache is invalidated even when the keyword map has not changed.
Change
Only invalidate if the effective keyword configuration actually changed.
Expected benefit
Improved cache hit rate and lower repeated domain prefilter cost.
Required observability
Emit structured events such as:
DOMAIN_PREFILTER_CACHE_HITDOMAIN_PREFILTER_CACHE_MISSDOMAIN_PREFILTER_CACHE_INVALIDATEDFiles
moralstack/constitution/retriever.pyB2. Reuse OpenAI clients / policy objects where safe
Problem
The retrieval stack and some mini-estimator paths recreate clients/policy objects too often.
Change
Introduce safe client/policy pooling/reuse to avoid repeated initialization overhead.
Expected benefit
Lower infrastructure overhead and improved connection reuse.
Notes
This is mainly an efficiency change and does not need prominent request-detail visualization, but may be surfaced in advanced diagnostics or run-level summaries.
Files
moralstack/constitution/retriever.pymoralstack/models/risk/estimator.pyB3. Strengthen structured-output enforcement in risk/retrieval
Problem
Risk/retrieval still have avoidable parse-related retries/jitter.
Change
Use
response_format={"type":"json_object"}where appropriate and update parsers/fallbacks accordingly.Expected benefit
Fewer malformed responses, fewer retries, lower jitter.
Required observability
Persist or surface:
Files
moralstack/models/risk/estimator.pymoralstack/constitution/retriever.pyC. Remove avoidable controller blocking
C1. Make speculative overlap truly lazy
Problem
Speculative generation is started in parallel with risk estimation, but the controller still blocks on the speculative result even when it will never be used.
Change
Refactor
_run_speculative_overlap()so that it returns:Only join the speculative future in branches that actually consume the draft.
Required events
SPECULATIVE_STARTEDSPECULATIVE_JOIN_REQUIREDSPECULATIVE_JOIN_SKIPPEDSPECULATIVE_RESULT_USEDSPECULATIVE_RESULT_DISCARDEDRequired
llm_callsmetadatacall_kind = speculativecall_outcome = used | discardedExpected benefit
Avoidable blocking disappears in paths where speculative output is irrelevant.
Files
moralstack/orchestration/controller.pyD. Eliminate duplicated request-level retrieval work
D1. Introduce a request-scoped analysis context
Change
Add a request-scoped context object (for example
RequestAnalysisContext) carrying:relevant_principlesconstitutiondetected_domainFiles
moralstack/orchestration/types.pymoralstack/orchestration/deliberation_runner.pyD2. Retrieve relevant principles once and reuse them downstream
Problem
Relevant-principles retrieval can happen multiple times for a single request.
Change
Perform retrieval once at the beginning of the deliberative path and pass the result to downstream consumers such as the critic.
Required events
RELEVANT_PRINCIPLES_RETRIEVEDRELEVANT_PRINCIPLES_REUSEDRequired trace
REQUEST_ANALYSIS_CONTEXTExpected benefit
Reduced duplicated retrieval cost without changing policy semantics.
Files
moralstack/orchestration/deliberation_runner.pymoralstack/runtime/modules/critic_module.pyE. Runtime-aware scheduling in the deliberative cycle
E1. Dynamically choose
critic_gatedvsfull_parallelProblem
The code already supports both strategies, but the runtime is not yet choosing dynamically based on risk posture.
Change
Inside the deliberation runner, choose scheduling strategy based on the request’s posture, for example using signals like:
intent_to_harm = true;operational_risk = HIGH;Required events
PARALLEL_STRATEGY_SELECTEDCRITIC_SHORT_CIRCUIT_TRIGGEREDPARALLEL_MODULE_CANCEL_ATTEMPTEDPARALLEL_MODULE_CANCELLEDPARALLEL_MODULE_COMPLETED_AFTER_SHORT_CIRCUITRequired trace
CYCLE_SUMMARY.scheduler_strategyExpected benefit
Avoid paying for expensive modules in cases where the critic can safely close the path early.
Files
moralstack/orchestration/deliberation_runner.pyE2. Make simulator gating explicit, conservative, and observable
Problem
Simulator gating exists, but it is not yet the default operating mode and its decisions are not first-class in observability.
Change
Enable simulator gating only after observability and tests are in place, and use conservative thresholds/logic such as:
PROCEEDwith zero violations;Required events
SIMULATOR_GATE_DECISIONSIMULATOR_EXECUTEDSIMULATOR_SKIPPEDRequired trace
CYCLE_SUMMARYExpected benefit
Reduced cycle cost in clearly converged/clean cases without degrading governance quality.
Files
moralstack/orchestration/config_loader.pymoralstack/orchestration/deliberation_runner.pyF. Conservative early convergence
F1. Allow early cycle-1 convergence only when all signals strongly align
Change
Extend convergence logic so that cycle 1 can terminate early only when:
Required events
CONVERGENCE_EVALUATEDEARLY_CONVERGENCE_ACCEPTEDEARLY_CONVERGENCE_REJECTEDRequired trace
CYCLE_SUMMARY.convergence_decisionCYCLE_SUMMARY.convergence_reasonExpected benefit
Avoid unnecessary second cycles in genuinely converged cases.
Files
moralstack/orchestration/convergence_evaluator.pyG. Multi-turn readiness
G1. Prepare persistence/schema for future conversational governance
Change
Add fields such as:
conversation_idturn_indexparent_request_idFuture direction
This issue does not require full conversational governance yet, but the schema should be ready for it.
Files
moralstack/persistence/db.pyG2. Lay groundwork for a future
ConversationGovernanceStateChange
Prepare an optional runtime state model that can later support:
Note
This is preparatory and should come last.
Testing requirements
The implementation must include or update tests for:
Persistence
orchestration_eventscreation, insertion, and retrievalllm_callsbackward-compatible extensionObservability/reporting
UI
Runtime behavior
Acceptance criteria
This issue is complete only when all of the following are true:
orchestration_eventslayer exists and is persisted.llm_callscan distinguish speculative/used/discarded/cancelled/skipped behavior.critic_gatedandfull_parallelis implemented and observable.Recommended implementation order
Block A — Observability before optimization
orchestration_eventsllm_callsBlock B — Low-risk, high-ROI fixes
set_domain_keywords()idempotentBlock C — Remove avoidable waits
Block D — Runtime-aware scheduling
critic_gatedvsfull_parallelBlock E — Future-proofing
Risks and cautions
Expected outcome
After this issue is completed, MoralStack should provide:
Better latency
Through:
Better auditability
Through:
Better readiness for multi-turn governance
Through: