Skip to content

Harden Codex handoff completion and state isolation#235

Draft
dechow73-maker wants to merge 2 commits into
awslabs:mainfrom
dechow73-maker:codex/cao-codex-handoff-fixes
Draft

Harden Codex handoff completion and state isolation#235
dechow73-maker wants to merge 2 commits into
awslabs:mainfrom
dechow73-maker:codex/cao-codex-handoff-fixes

Conversation

@dechow73-maker

Copy link
Copy Markdown

CAO PR Draft

Date: 2026-05-13 Asia/Shanghai
Local branch: codex/cao-codex-handoff-fixes
Base: awslabs/cli-agent-orchestrator:main at b57a064

Title

Harden Codex handoff completion and state isolation

Body

Summary

This PR fixes two issues found while running live CAO/Codex handoff pilots against current Codex CLI output:

  • adds a CAO_HOME_DIR override so local pilots and CI can isolate CAO state from ~/.aws/cli-agent-orchestrator;
  • hardens Codex provider status detection and output extraction so blocking handoff recognizes real final answers while ignoring stale spinner rows and Codex TUI tool-activity bullets.

Details

The state-isolation change keeps CAO-managed directories under the overridden home, including agent-store and agent-context, and documents the override in docs/settings.md.

The Codex provider change recognizes current footer formats such as gpt-5.5 xhigh · /path, anchors footer detection to status-bar-shaped lines, distinguishes tool activity rows from final answer bullets, and extracts the post-tool final response after MCP/tool activity.

Testing

uv run pytest test/providers/test_codex_provider_unit.py test/services/test_settings_service.py test/test_constants.py
125 passed

uv run pytest test/ --ignore=test/providers/test_q_cli_integration.py --ignore=test/providers/test_kiro_cli_integration.py --ignore=test/e2e -m 'not e2e'
1643 passed, 1 skipped, 93% coverage

uv run python -m compileall -q src test/test_constants.py test/providers/test_codex_provider_unit.py test/services/test_settings_service.py
uv run black --check src/ test/
uv run isort --check-only src/ test/
uv run mypy src/cli_agent_orchestrator/constants.py src/cli_agent_orchestrator/providers/codex.py src/cli_agent_orchestrator/services/settings_service.py
git diff --check
uv build

cd web && npm ci
cd web && npm test
cd web && npx tsc --noEmit
cd web && npm run build

uv run mypy src/ still reports 21 repository-wide pre-existing errors outside the touched source files; the touched-source mypy gate above passes.

Live Smoke

  • CAO server: 127.0.0.1:9899
  • Supervisor profile: pilot_codex_supervisor
  • Worker profile: pilot_codex_inspector
  • Result: cao-mcp-server.handoff returned success in 28.06s
  • Worker output confirmed visible top-level entry SESSION_STATE.md and SESSION_STATE.md exists: yes

Push And PR Commands

Use these once GitHub auth and a fork/remote are available:

cd "/Volumes/P8/Codex projects/agent-orchestration-eval/repos/cli-agent-orchestrator"

# If needed, create a fork in GitHub first, then add it locally:
git remote add fork https://github.com/<your-github-user>/cli-agent-orchestrator.git

git push -u fork codex/cao-codex-handoff-fixes

Then open:

https://github.com/awslabs/cli-agent-orchestrator/compare/main...<your-github-user>:cli-agent-orchestrator:codex/cao-codex-handoff-fixes?expand=1

Alternative with GitHub CLI after installing/authenticating gh:

gh pr create \
  --repo awslabs/cli-agent-orchestrator \
  --base main \
  --head <your-github-user>:codex/cao-codex-handoff-fixes \
  --title "Harden Codex handoff completion and state isolation" \
  --body-file "/Volumes/P8/Codex projects/agent-orchestration-eval/CAO_PR_DRAFT_2026-05-13.md" \
  --draft

Offline Patch Artifact

Two-commit mailbox patch:

/Volumes/P8/Codex projects/agent-orchestration-eval/patches/cao-codex-handoff-fixes-2commits.mbox

@dechow73-maker

Copy link
Copy Markdown
Author

Operator workflow note saved from the local orchestration run, not part of the requested upstream source diff:

  • Use GPT-5.5 / GPT-5.5 Extra High as the central planning, compaction, adjudication, risk-control, and final-synthesis layer.
  • Use Qwen 3.6 7B/14B/27B only as verified bounded execution lanes: 7B for cheap extraction/mechanical checks, 14B for structured summaries/checklists, 27B for heavier synthesis or secondary critique.
  • Use Moonbridge only after route verification as a local routing/fallback layer; do not treat it as final authority.
  • Use subagents only for narrow verification, audit, extraction, test execution, file inspection, citation checking, or compact context summaries.
  • Treat all lane/subagent output as provisional until GPT-5.5 checks it against source/context; compare by evidence and tests, never by majority vote.
  • For recurring agent supervision, use a thread heartbeat automation: check the target agent on the requested interval, respond exactly next only if the target has completed and its next action has been read, otherwise stay quiet and keep the heartbeat active.
  • Escalate to the human only for missing credentials, destructive/irreversible operations, sensitive data export, costly fanout/retries, raw transcript logging, or material ambiguity.

Local reusable skill saved at: /Users/lianzhong/.codex/skills/agent-continuation-supervisor/SKILL.md.

@codecov-commenter

codecov-commenter commented May 14, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.57895% with 8 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@b57a064). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/cli_agent_orchestrator/providers/codex.py 92.59% 6 Missing ⚠️
...li_agent_orchestrator/services/settings_service.py 84.61% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #235   +/-   ##
=======================================
  Coverage        ?   92.78%           
=======================================
  Files           ?       65           
  Lines           ?     5504           
  Branches        ?        0           
=======================================
  Hits            ?     5107           
  Misses          ?      397           
  Partials        ?        0           
Flag Coverage Δ
unittests 92.78% <91.57%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@haofeif haofeif added the enhancement New feature or request label May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants