Harden Codex handoff completion and state isolation by dechow73-maker · Pull Request #235 · awslabs/cli-agent-orchestrator

dechow73-maker · 2026-05-13T09:55:04Z

CAO PR Draft

Date: 2026-05-13 Asia/Shanghai
Local branch: codex/cao-codex-handoff-fixes
Base: awslabs/cli-agent-orchestrator:main at b57a064

Title

Harden Codex handoff completion and state isolation

Body

Summary

This PR fixes two issues found while running live CAO/Codex handoff pilots against current Codex CLI output:

adds a CAO_HOME_DIR override so local pilots and CI can isolate CAO state from ~/.aws/cli-agent-orchestrator;
hardens Codex provider status detection and output extraction so blocking handoff recognizes real final answers while ignoring stale spinner rows and Codex TUI tool-activity bullets.

Details

The state-isolation change keeps CAO-managed directories under the overridden home, including agent-store and agent-context, and documents the override in docs/settings.md.

The Codex provider change recognizes current footer formats such as gpt-5.5 xhigh · /path, anchors footer detection to status-bar-shaped lines, distinguishes tool activity rows from final answer bullets, and extracts the post-tool final response after MCP/tool activity.

Testing

uv run pytest test/providers/test_codex_provider_unit.py test/services/test_settings_service.py test/test_constants.py
125 passed

uv run pytest test/ --ignore=test/providers/test_q_cli_integration.py --ignore=test/providers/test_kiro_cli_integration.py --ignore=test/e2e -m 'not e2e'
1643 passed, 1 skipped, 93% coverage

uv run python -m compileall -q src test/test_constants.py test/providers/test_codex_provider_unit.py test/services/test_settings_service.py
uv run black --check src/ test/
uv run isort --check-only src/ test/
uv run mypy src/cli_agent_orchestrator/constants.py src/cli_agent_orchestrator/providers/codex.py src/cli_agent_orchestrator/services/settings_service.py
git diff --check
uv build

cd web && npm ci
cd web && npm test
cd web && npx tsc --noEmit
cd web && npm run build

uv run mypy src/ still reports 21 repository-wide pre-existing errors outside the touched source files; the touched-source mypy gate above passes.

Live Smoke

CAO server: 127.0.0.1:9899
Supervisor profile: pilot_codex_supervisor
Worker profile: pilot_codex_inspector
Result: cao-mcp-server.handoff returned success in 28.06s
Worker output confirmed visible top-level entry SESSION_STATE.md and SESSION_STATE.md exists: yes

Push And PR Commands

Use these once GitHub auth and a fork/remote are available:

cd "/Volumes/P8/Codex projects/agent-orchestration-eval/repos/cli-agent-orchestrator"

# If needed, create a fork in GitHub first, then add it locally:
git remote add fork https://github.com/<your-github-user>/cli-agent-orchestrator.git

git push -u fork codex/cao-codex-handoff-fixes

Then open:

https://github.com/awslabs/cli-agent-orchestrator/compare/main...<your-github-user>:cli-agent-orchestrator:codex/cao-codex-handoff-fixes?expand=1

Alternative with GitHub CLI after installing/authenticating gh:

gh pr create \
  --repo awslabs/cli-agent-orchestrator \
  --base main \
  --head <your-github-user>:codex/cao-codex-handoff-fixes \
  --title "Harden Codex handoff completion and state isolation" \
  --body-file "/Volumes/P8/Codex projects/agent-orchestration-eval/CAO_PR_DRAFT_2026-05-13.md" \
  --draft

Offline Patch Artifact

Two-commit mailbox patch:

/Volumes/P8/Codex projects/agent-orchestration-eval/patches/cao-codex-handoff-fixes-2commits.mbox

dechow73-maker · 2026-05-13T12:35:36Z

Operator workflow note saved from the local orchestration run, not part of the requested upstream source diff:

Use GPT-5.5 / GPT-5.5 Extra High as the central planning, compaction, adjudication, risk-control, and final-synthesis layer.
Use Qwen 3.6 7B/14B/27B only as verified bounded execution lanes: 7B for cheap extraction/mechanical checks, 14B for structured summaries/checklists, 27B for heavier synthesis or secondary critique.
Use Moonbridge only after route verification as a local routing/fallback layer; do not treat it as final authority.
Use subagents only for narrow verification, audit, extraction, test execution, file inspection, citation checking, or compact context summaries.
Treat all lane/subagent output as provisional until GPT-5.5 checks it against source/context; compare by evidence and tests, never by majority vote.
For recurring agent supervision, use a thread heartbeat automation: check the target agent on the requested interval, respond exactly next only if the target has completed and its next action has been read, otherwise stay quiet and keep the heartbeat active.
Escalate to the human only for missing credentials, destructive/irreversible operations, sensitive data export, costly fanout/retries, raw transcript logging, or material ambiguity.

Local reusable skill saved at: /Users/lianzhong/.codex/skills/agent-continuation-supervisor/SKILL.md.

codecov-commenter · 2026-05-14T00:37:33Z

Codecov Report

❌ Patch coverage is 91.57895% with 8 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@b57a064). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/cli_agent_orchestrator/providers/codex.py	92.59%	6 Missing ⚠️
...li_agent_orchestrator/services/settings_service.py	84.61%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #235   +/-   ##
=======================================
  Coverage        ?   92.78%           
=======================================
  Files           ?       65           
  Lines           ?     5504           
  Branches        ?        0           
=======================================
  Hits            ?     5107           
  Misses          ?      397           
  Partials        ?        0

Flag	Coverage Δ
unittests	`92.78% <91.57%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dechow73-maker added 2 commits May 13, 2026 13:38

Add CAO_HOME_DIR state isolation

2416e8b

Harden Codex handoff completion detection

864e013

haofeif added the enhancement New feature or request label May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden Codex handoff completion and state isolation#235

Harden Codex handoff completion and state isolation#235
dechow73-maker wants to merge 2 commits into
awslabs:mainfrom
dechow73-maker:codex/cao-codex-handoff-fixes

dechow73-maker commented May 13, 2026

Uh oh!

dechow73-maker commented May 13, 2026

Uh oh!

codecov-commenter commented May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dechow73-maker commented May 13, 2026

CAO PR Draft

Title

Body

Summary

Details

Testing

Live Smoke

Push And PR Commands

Offline Patch Artifact

Uh oh!

dechow73-maker commented May 13, 2026

Uh oh!

codecov-commenter commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented May 14, 2026 •

edited

Loading