Summary
A realtime StoryRun can reach terminal phase while status.stepStates still reports all steps as Running, even after the child StepRuns have been deleted.
This leaves the StoryRun status internally inconsistent and makes post-finish reconciliation/debugging much harder.
Live evidence
StoryRun:
livekit-voice/livekit-voice-assistant-rm-6jwy7uutjydd-241a4c029032ae1d
Observed object state:
status.phase: Finished
status.message: StoryRun gracefully canceled
status.finishedAt: 2026-04-22T18:15:32Z
- all entries in
status.stepStates[*].phase still Running
Cluster state at the same time:
kubectl get stepruns -A returned No resources found
Controller logs for the same run showed realtime StepRun deletion cleanup executing successfully:
Reconciling deletion for StepRun
Deleting realtime resource after TTL expiry
Owned resources are deleted, removing finalizer
Why this matters
A terminal StoryRun that still reports every step as Running breaks status invariants:
- the user-facing run state is misleading
- downstream controllers/debug tooling can no longer trust
status.stepStates
- it becomes difficult to tell whether the run terminated cleanly or was partially torn down
Suspected root cause area
handleGracefulCancel in internal/controller/runs/storyrun_controller.go:1517-1577 can finish the StoryRun after force-deleting remaining StepRuns, but there is no guaranteed final sync that converts the in-memory step states to terminal values before the StepRuns disappear.
The DAG sync path in internal/controller/runs/dag.go:955-1013 mirrors StepRun phases into StoryRun status while StepRuns still exist. Once the StepRuns are gone, there is no source of truth left to repair the stale Running entries.
Acceptance criteria
- A terminal StoryRun must not retain
Running entries in status.stepStates
- Bobrapet should mark remaining realtime steps terminal before or during cancellation-based finish
- Add regression coverage for the sequence:
- realtime StoryRun running
- cancel requested
- StepRuns deleted
- StoryRun becomes terminal
status.stepStates is terminal/consistent
Summary
A realtime StoryRun can reach terminal phase while
status.stepStatesstill reports all steps asRunning, even after the child StepRuns have been deleted.This leaves the StoryRun status internally inconsistent and makes post-finish reconciliation/debugging much harder.
Live evidence
StoryRun:
livekit-voice/livekit-voice-assistant-rm-6jwy7uutjydd-241a4c029032ae1dObserved object state:
status.phase: Finishedstatus.message: StoryRun gracefully canceledstatus.finishedAt: 2026-04-22T18:15:32Zstatus.stepStates[*].phasestillRunningCluster state at the same time:
kubectl get stepruns -AreturnedNo resources foundController logs for the same run showed realtime StepRun deletion cleanup executing successfully:
Reconciling deletion for StepRunDeleting realtime resource after TTL expiryOwned resources are deleted, removing finalizerWhy this matters
A terminal StoryRun that still reports every step as
Runningbreaks status invariants:status.stepStatesSuspected root cause area
handleGracefulCancelininternal/controller/runs/storyrun_controller.go:1517-1577can finish the StoryRun after force-deleting remaining StepRuns, but there is no guaranteed final sync that converts the in-memory step states to terminal values before the StepRuns disappear.The DAG sync path in
internal/controller/runs/dag.go:955-1013mirrors StepRun phases into StoryRun status while StepRuns still exist. Once the StepRuns are gone, there is no source of truth left to repair the staleRunningentries.Acceptance criteria
Runningentries instatus.stepStatesstatus.stepStatesis terminal/consistent