Event-driven architecture: rebase onto main + green the suite (continues #115) by call-me-ram · Pull Request #273 · awslabs/cli-agent-orchestrator

call-me-ram · 2026-06-03T15:12:11Z

Summary

Continues @tuanknguyen's event-driven architecture work from #115, per @haofeif's suggestion there. That branch had fallen ~98 commits behind main and was failing CI (a stale INBOX_POLLING_INTERVAL import left after the constant was removed). This merges current main into it, resolves the conflicts, and gets the full unit suite passing. It's also the "unified delivery engine" I referenced in #266.

What it does

Replaces the watchdog/polling model for terminal output, status detection, and inbox delivery with an in-process event bus:

FifoReader ──▶ EventBus ──▶ { StatusMonitor, LogWriter }
                StatusMonitor ──▶ EventBus ──▶ InboxService

Terminal output streams through named FIFOs (tmux pipe-pane) rather than being polled from log files.
StatusMonitor is the single source of truth for status; providers' get_status() now parse a buffer string instead of reading tmux themselves.
InboxService delivers on status events; the watchdog PollingObserver/LogFileHandler is removed.
Provider initialize() and terminal_service.create_terminal() are now async.

Reconciling with `main`

All features that landed after the branch forked are preserved on top of the event-driven base:

Eager inbox delivery (feat: eager inbox delivery for providers that buffer input during processing #251) — re-expressed as an eager gate in InboxService.deliver_pending.
OpenCode poller — kept as a status-driven poll_opencode_pending_messages (OpenCode's output goes quiet once idle, so the event bus alone may not wake delivery).
Env-var forwarding (feat(launch): forward env vars to supervisor and child agents #259), CORS auto-derive (feat(api): auto-derive CORS origins from cao-server --host/--port #261), DNS host validation (fix(api): add DNS rebinding protection via Host header validation #124), memory curation (feat(memory): SQLite metadata, BM25 fallback, context-manager injection #254/feat(memory): Phase 2.5 hardening #262), self-send guard (Worker Agent Sending Message Itself Upon Completion of Assignment #24), skills, plugin events — retained and wired through the async/buffer contract.
Converted copilot_cli / opencode_cli to the async initialize + buffer get_status contract.

Testing

Full unit suite green — 1908 passed, 31 skipped (pytest test/ --ignore=*_integration.py --ignore=test/e2e -m "not e2e"), Python 3.11.
black --check and isort --check-only clean.

Supersedes #115. Related: #266.

Bring the event-driven architecture branch up to date with main (98 commits) and reconcile the rewrite with features that landed after it forked: eager inbox delivery (awslabs#251), the OpenCode poller, env-var forwarding (awslabs#259), memory curation (awslabs#254/awslabs#262), CORS auto-derive (awslabs#261), DNS host validation (awslabs#124), and the self-send guard (awslabs#24). Highlights: - Providers adopt the async initialize() + get_status(buffer) contract; copilot_cli/opencode_cli converted; kiro keeps colour-only ANSI stripping so carriage-return-redraw permission prompts aren't misread as idle. - Event-driven InboxService.deliver_pending with the awslabs#251 eager gate and message-sender attribution; OpenCode poller retained as a status-driven method; the watchdog (PollingObserver/LogFileHandler) is removed. - terminal_service.create_terminal is async (FIFO + StatusMonitor wiring); session_service.create_session, flow_service.execute_flow, the API endpoints, and `cao flow run` updated to await. - memory_service curated path and the flow CLI fixed to the new contract. Full unit suite green (1908 passed); black + isort clean.

codecov-commenter · 2026-06-03T23:48:50Z

Codecov Report

❌ Patch coverage is 77.71346% with 154 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@7a7d1e1). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/cli_agent_orchestrator/services/event_bus.py	28.57%	50 Missing ⚠️
src/cli_agent_orchestrator/services/fifo_reader.py	44.61%	36 Missing ⚠️
.../cli_agent_orchestrator/services/status_monitor.py	81.72%	17 Missing ⚠️
src/cli_agent_orchestrator/services/log_writer.py	47.82%	12 Missing ⚠️
...li_agent_orchestrator/services/terminal_service.py	72.72%	9 Missing ⚠️
src/cli_agent_orchestrator/utils/terminal.py	90.90%	5 Missing ⚠️
...rc/cli_agent_orchestrator/services/flow_service.py	69.23%	4 Missing ⚠️
src/cli_agent_orchestrator/api/main.py	86.95%	3 Missing ⚠️
src/cli_agent_orchestrator/providers/codex.py	85.00%	3 Missing ⚠️
...rc/cli_agent_orchestrator/providers/copilot_cli.py	75.00%	3 Missing ⚠️
... and 8 more

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #273   +/-   ##
=======================================
  Coverage        ?   89.49%           
=======================================
  Files           ?       84           
  Lines           ?     8425           
  Branches        ?        0           
=======================================
  Hits            ?     7540           
  Misses          ?      885           
  Partials        ?        0

Flag	Coverage Δ
unittests	`89.49% <77.71%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR rebases and completes the event-driven architecture migration by replacing watchdog/log-file polling with an in-process EventBus pipeline that streams tmux output through FIFOs, derives terminal status from a rolling buffer, and triggers inbox delivery off status-change events. It also converts provider initialization and terminal/session creation flows to async and updates the unit/integration tests accordingly.

Changes:

Introduces EventBus + FifoManager + StatusMonitor + LogWriter + InboxService pipeline, and removes watchdog-based log watching (and related deps).
Converts terminal/session/flow execution and provider initialize() to async; updates status detection to consume a buffer string instead of querying tmux directly.
Updates API/CLI wiring and large portions of the test suite to match the new async + buffer-driven contracts.

Reviewed changes

Copilot reviewed 57 out of 58 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
uv.lock	Removes `watchdog`/`aiofiles` from the locked dependency set.
pyproject.toml	Drops `watchdog`/`aiofiles` runtime dependencies.
CODEBASE.md	Updates architecture diagrams/flow docs for FIFO + event bus pipeline.
docs/api.md	Updates inbox delivery description to event-driven status detection.
docs/event-driven-architecture.md	Adds detailed documentation for the new event-driven pipeline.
src/cli_agent_orchestrator/api/main.py	Lifespan now starts event-bus consumers and awaits async services.
src/cli_agent_orchestrator/cli/commands/flow.py	Drives async `execute_flow()` from sync Click command via `asyncio.run()`.
src/cli_agent_orchestrator/constants.py	Adds FIFO/event-bus/state-buffer constants; repurposes polling interval.
src/cli_agent_orchestrator/models/terminal.py	Adds `TerminalStatus.UNKNOWN`.
src/cli_agent_orchestrator/providers/base.py	Makes `initialize()` async and changes `get_status()` signature to `get_status(buffer)`.
src/cli_agent_orchestrator/providers/claude_code.py	Async init; status detection consumes buffer instead of tmux reads.
src/cli_agent_orchestrator/providers/codex.py	Async init and buffer-based status detection; async trust prompt handling.
src/cli_agent_orchestrator/providers/copilot_cli.py	Async init; status detection consumes buffer string.
src/cli_agent_orchestrator/providers/gemini_cli.py	Async init and buffer-based status detection; keeps warmup logic.
src/cli_agent_orchestrator/providers/kimi_cli.py	Async init and buffer-based status detection.
src/cli_agent_orchestrator/providers/kiro_cli.py	Async init; buffer-based status detection; improved escape stripping for extraction.
src/cli_agent_orchestrator/providers/opencode_cli.py	Async init; buffer-based status detection; ERROR fallback becomes UNKNOWN.
src/cli_agent_orchestrator/providers/q_cli.py	Async init; buffer-based status detection; removes log-idle helper.
src/cli_agent_orchestrator/services/cleanup_service.py	Stops FIFO readers and clears status buffers during retention cleanup.
src/cli_agent_orchestrator/services/event_bus.py	Adds thread-safe in-process pub/sub with wildcard subscriptions.
src/cli_agent_orchestrator/services/fifo_reader.py	Adds per-terminal FIFO reader threads publishing output events.
src/cli_agent_orchestrator/services/flow_service.py	Makes `execute_flow()` async; busy-check now uses `status_monitor`.
src/cli_agent_orchestrator/services/inbox_service.py	Replaces watchdog delivery with status-event-driven delivery logic.
src/cli_agent_orchestrator/services/log_writer.py	Persists streamed output chunks to per-terminal logs.
src/cli_agent_orchestrator/services/memory_service.py	Curator liveness checks now use `status_monitor`.
src/cli_agent_orchestrator/services/session_service.py	Makes `create_session()` async; delete path delegates terminal teardown.
src/cli_agent_orchestrator/services/status_monitor.py	Adds rolling buffer + provider-driven status derivation + status events.
src/cli_agent_orchestrator/services/terminal_service.py	Makes `create_terminal()` async; wires FIFO + pipe-pane; status/output sources updated.
src/cli_agent_orchestrator/utils/event.py	Adds helper to extract terminal id from topic names.
src/cli_agent_orchestrator/utils/terminal.py	Makes `wait_for_shell`/`wait_until_status` async and status-monitor-based; switches polling client to `requests`.
src/cli_agent_orchestrator/utils/text.py	Adds robust terminal escape/control stripping utility.
test/api/test_api_endpoints.py	Updates async service mocks for endpoints; updates lifespan expectations.
test/api/test_flow_api.py	Updates flow API tests for awaited async execution.
test/api/test_plugin_lifespan.py	Updates lifespan test stubs for new consumer tasks and OpenCode poller.
test/api/test_terminals.py	Updates endpoint tests for async service methods and new inbox delivery method.
test/cli/commands/test_flow.py	Updates CLI flow tests for async `execute_flow()` mocking.
test/providers/conftest.py	Adds shared integration fixtures bootstrapping event pipeline + DB mocks.
test/providers/test_base_provider.py	Updates base provider tests for async init + buffer-based status contract.
test/providers/test_copilot_cli_unit.py	Updates status tests for buffer contract; adjusts init tests.
test/providers/test_kimi_cli_unit.py	Updates init/status tests for async + buffer contract.
test/providers/test_kiro_cli_integration.py	Refactors integration tests to use real create_terminal FIFO pipeline.
test/providers/test_opencode_cli_unit.py	Updates fixtures/tests for buffer contract and async init.
test/providers/test_permission_prompt_detection.py	Refactors Kiro permission prompt tests to pass fixture output into `get_status(output)`.
test/services/test_cleanup_service.py	Updates cleanup tests for new FIFO/status monitor interactions.
test/services/test_context_manager_agent.py	Updates memory curator tests to use status_monitor status gating.
test/services/test_flow_service.py	Updates flow service tests for async execute path and status_monitor busy check.
test/services/test_plugin_event_emission.py	Updates event emission tests for async create_* and inbox delivery refactor.
test/services/test_session_service.py	Updates session service tests for async create_session and new teardown delegation.
test/services/test_terminal_service_coverage.py	Updates terminal service cleanup-path coverage for async init + FIFO/status cleanup.
test/utils/test_terminal.py	Updates terminal util tests for async waiters + requests patching.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    # Start event bus consumers as background tasks
+    status_monitor_task = asyncio.create_task(status_monitor.run())
+    log_writer_task = asyncio.create_task(log_writer.run())
+    inbox_service_task = asyncio.create_task(inbox_service.run())
+    logger.info("Event bus consumers started (StatusMonitor, LogWriter, InboxService)")


+### Status Monitor (`services/status_monitor.py`) — Publisher + Consumer
+
+Subscribes to `terminal.*.output`. Accumulates output into a rolling buffer (8KB) per terminal, detects status via the registered provider (or a generic shell prompt pattern before init), and publishes `terminal.{id}.status` on change. Also the source of truth for current terminal status.
+


haofeif · 2026-06-03T23:53:44Z

@tuankn22 @tuanknguyen @patricka3125 @anilkmr-a2z can you please help to review this PR ? @anilkmr-a2z i saw you have another PR #271 , are there any duplication ?

Resolve the codex test conflict introduced by awslabs#274 (codex v0.136 TUI footer + MCP tool-call marker filtering). awslabs#274 added its get_status regressions against the old tmux-reading contract (mock_tmux.get_history + get_status()); adapt them to the event-driven get_status(buffer) contract this branch introduces. codex.py auto-merged cleanly — the awslabs#274 regex/extraction fixes layer onto buffer-based get_status unchanged. Full unit suite green: 1948 passed, 1 skipped.

Resolve four review comments on PR awslabs#273: - InboxService.run() ran deliver_pending() synchronously inside the asyncio consumer loop, blocking it on DB + tmux I/O and starving the StatusMonitor/LogWriter consumers. Offload delivery via asyncio.to_thread, matching the threading discipline documented for the event bus and the existing OpenCode poller. - Event-driven deliveries were started without a PluginRegistry, so they skipped PostSendMessageEvent hooks. Thread the registry through run() -> deliver_pending so status-driven deliveries get the same plugin attribution as the immediate and OpenCode-poller paths. - FifoManager.stop_reader() early-returned when no in-memory reader was tracked, never unlinking the FIFO. Retention cleanup after a restart (empty _readers) would leak *.fifo files unbounded. Always best-effort unlink the FIFO; only log when a reader was actually stopped. - Docs claimed StatusMonitor falls back to a generic shell-prompt pattern before init; _detect_status returns UNKNOWN until a provider registers. Correct the docs to match. Adds regression tests for the registry threading, the to_thread offload, and the stale-FIFO unlink.

Reconcile the event-driven inbox rewrite with awslabs#265 and awslabs#266, which just landed on main and edit the same files: - inbox_service.py: keep the event-driven InboxService, but apply awslabs#265's ordering — mark messages DELIVERED before send_input so output echoed back through the FIFO/StatusMonitor pipeline can't re-enter deliver_pending and double-deliver a still-PENDING message (awslabs#164). Add awslabs#266's reconcile_orphaned_messages as a method that routes stale PENDING receivers back through deliver_pending (awslabs#131). - api/main.py: keep the event-bus consumers + OpenCode poller; add awslabs#266's inbox_reconciliation_daemon task (start + cancel). Drop awslabs#266's watchdog PollingObserver wiring — the event bus replaced it here. - constants.py: keep both INBOX_POLLING_INTERVAL and awslabs#266's INBOX_RECONCILE_INTERVAL/GRACE; reword the comments off the watchdog. - tests: adapt awslabs#265's ordering regression and awslabs#266's reconcile + lifespan tests to the event-driven class API (deliver_pending, status_monitor, the singleton's bound method); drop the watchdog LogFileHandler tests. Full unit suite green: 1958 passed, 1 skipped.

anilkmr-a2z · 2026-06-04T17:04:08Z

@haofeif That PR actually doesn't change tmux integ but add support for Herdr which provide these notification already. But yeah, I think it would be good to change tmux to event driven too, if we can. I have noticed the machine gets slow down as hell when there are lot of terminals. So, trying to find solution to make it efficient.

call-me-ram · 2026-06-08T01:08:37Z

Edited 2026-06-08: (1) the Kimi issue has since been diagnosed and fixed — it was a Kimi-side TUI redesign, not a regression — and verified end-to-end on the real Kimi CLI (see the Kimi item below); (2) corrected two minor details in an earlier draft (test count; the Codex vs Kiro fixes were conflated). This branch was also merged with main to pick up the herdr backend (#271) — conflicts resolved, full unit suite green.

@haofeif — thank you for the exceptionally detailed review. Pushed a fix (commits 0f05af8..af8e96c on the branch).

Root cause confirmed

You were exactly right: StatusMonitor feeds the raw pipe-pane buffer (cursor-positioning escapes, in-place redraws, OSC titles) to provider.get_status(), but the detectors were calibrated on tmux-rendered snapshots. I reproduced it end-to-end on real CLIs and replayed the captured buffers through get_status to confirm.

What changed

claude_code / codex / kimi_cli get_status() now run the buffer through utils.text.strip_terminal_escapes() before their structural checks (cursor moves → newlines, escapes/OSC/control stripped). Documented the raw-buffer input contract on BaseProvider.get_status. Also hardened strip_terminal_escapes to turn forward column moves (\x1b[<n>G / \x1b[<n>C) into a single space, so column-laid-out text doesn't glue together (Workedfor3s → Worked for 3s).
kiro_cli (your P2): clears the StatusMonitor buffer (new StatusMonitor.reset_buffer) on the TUI → --legacy-ui fallback, so the retry isn't derived from stale TUI bytes.
Newest Claude Code TUI (found while reproducing): the current Claude Code shipped a redesigned TUI — no ⏺ marker; the ❯ prompt is boxed between two separators; the live spinner renders above the box and cycles a glyph set that includes a bare ASCII *; completion shows as ✻ <Verb>ed for Ns and responses use ● (U+25CF) instead of ⏺. claude_code now handles all of that — PROCESSING is anchored to the spinner on the line directly above the box that contains the last ❯ (rejects stale spinners and mid-buffer separator regions), COMPLETED is robust to clipped/●-only frames, and extraction recognizes ●. It's gated on the new box / line-start anchors, so older builds keep the legacy ⏺ path unchanged.
Input submission (newest Claude Code): the Ink renderer swallowed an Enter sent too soon after a bracketed paste, leaving the message unsubmitted. Made the post-paste delay provider-tunable (BaseProvider.paste_submit_delay; Claude overrides it).

Verified ✅

Full existing unit suite stays green (2036 passed locally / 2039 in the full dev run, + ~24 new regression tests — ≈18 status/extraction in test_claude_code_unit.py, 6 for strip_terminal_escapes in test_text.py — and the real raw capture as a fixture) → no regression to the versions those tests cover.
Real end-to-end on Claude Code 2.1.168: TestClaudeCodeHandoff (greet + multiply) and TestClaudeCodeAssign (data_analyst, report_generator, callback) pass against a cao-server built from this branch.

Needs your check ⚠️ (I don't have these CLIs locally)

Codex: I applied the documented escape-strip (strip_terminal_escapes in CodexProvider.get_status) and kept every existing unit test green, but I can't run its real e2e. Could you share the captured raw buffers you replayed through CodexProvider.get_status / ClaudeCodeProvider.get_status? I'll add them as regression fixtures and verify against your exact inputs. On Codex specifically — you flagged the idle › glyph may be cursor-drawn and absent from the raw stream; if your buffer confirms that, the strip alone won't surface it and I'll add footer-based idle detection.
Kiro: the fix here is the --legacy-ui buffer reset (new StatusMonitor.reset_buffer on the TUI → --legacy-ui fallback), not the escape-strip — Kiro deliberately keeps its SGR-only strip (a strip_terminal_escapes pass would break its \r-redraw permission detection). Existing unit tests stay green; I can't run its real e2e.
Kimi — resolved ✅ (now tested on the real CLI). Root cause found and fixed. The newest Kimi CLI (the "Kimi Code" rebuild, v1.47.0) redesigned its TUI: it dropped the ✨/💫 prompt for a boxed ── input ── area + an agent (<model> ●) status bar + a context: N% footer, with a braille-spinner working indicator (⠇ Thinking… Ns · N tokens). get_status() keyed on the old emoji prompt, so it never observed IDLE and init timed out — i.e. a Kimi-side TUI redesign, not a regression from the event-driven / raw-buffer change (the legacy detector fails the same way on any branch). Fixed get_status() to detect the new TUI, gated on the new markers so legacy emoji builds are unchanged, with position-based spinner-vs-ready detection that is robust to stale spinner frames left in the rolling buffer, plus real-capture regression fixtures. Verified on kimi 1.47.0 against a cao-server built from this branch: TestKimiCliHandoff (2) + TestKimiCliAssign (3) pass, and Claude ↔ Kimi cross-provider assign passes both directions (added TestCrossProviderClaudeToKimi / TestCrossProviderKimiToClaude). Fix in commit e5d185e.

Looking ahead — a proposal to stop re-tuning `get_status` on every TUI redesign

The whole class of bug here comes from guessing state by scraping the screen — so every CLI redesign breaks it (which is exactly what the newest Claude Code did). A more robust model is to have the agent/CLI signal its state to CAO:

Best — CLI-native hooks: e.g. Claude Code's Stop / Notification hooks → POST to CAO (CAO already writes Claude settings, so wiring this is straightforward). The CLI itself says "turn done" / "needs input" — version-stable and TUI-independent.
Fallback — an MCP report_status tool: CAO already gives every agent an MCP channel.
PROCESSING is free (CAO knows it sent the input); the hard signals (DONE / WAITING) come from the hook.

This lines up with @anilkmr-a2z's Herdr / #271 direction. If you're open to it, I'm happy to open a separate issue with a concrete design — it could eventually retire most of the per-TUI scraping in get_status.

Thanks again — genuinely one of the most useful reviews I've had.

…awslabs#273) Reconcile the two concurrent refactors of the same providers/services: PR awslabs#271 added a terminal backend abstraction (tmux/herdr) with polling get_status; this branch (awslabs#273) replaced polling with an event-driven FIFO -> EventBus -> StatusMonitor/LogWriter/InboxService pipeline and async, buffer-based provider status detection. Resolution keeps the event-driven design as primary and layers in the backend abstraction: - Route terminal I/O through get_backend() (TmuxBackend delegates to tmux_client, so the default path is behaviorally unchanged) while keeping the FIFO pipe-pane stream as the tmux event source. - Guard FIFO setup in create_terminal with `not get_backend().supports_event_inbox()`; herdr uses its socket-event inbox. Keep herdr inbox registration additively (no-op for tmux). - Providers keep async initialize() and buffer-based get_status(output); claude_code keeps its newest-TUI parser plus herdr's native-status short-circuit (the get_history poll inside get_status is removed). - utils.terminal wait_for_shell/wait_until_status stay async + terminal_id based (StatusMonitor-driven). - API lifespan starts the EventBus consumers unconditionally and the herdr inbox service only for HerdrBackend; the removed watchdog PollingObserver branch is dropped. - inbox_service.deliver_pending keeps the event-driven readiness gate and resets herdr TerminalNotFoundError deliveries to PENDING for retry. - get_output(FULL) returns the StatusMonitor buffer; get_output(LAST) keeps awslabs#271's escalating fetch via get_backend(). Also: add submit_delay to the backend send_keys interface (Claude paste timing); update awslabs#271 net-new tests for async create_terminal / the rewired lifespan / the send_keys signature; fix pre-existing herdr mypy issues. Full unit suite green (2198 passed, 17 skipped); mypy, black and isort clean.

The newest Kimi CLI (the "Kimi Code" rebuild) replaced the ✨/💫 emoji prompt with a boxed input area ("── input ──"), an "agent (<model> ●)" status bar, a "context: N%" usage footer, and a braille-spinner working indicator ("⠧ Thinking… Ns · N tokens"). The legacy detector keyed on the emoji prompt, so it never observed IDLE and every Kimi terminal timed out at init — this was the failure flagged on awslabs#273 (now confirmed to be a Kimi-side TUI redesign, not a regression from the event-driven refactor). get_status now detects the new TUI (gated on the new status/footer markers so legacy emoji builds are unchanged): - READY = status bar / context footer present with no live spinner. - PROCESSING vs READY is decided by POSITION (last braille spinner line vs last ready-chrome line), so stale spinner frames lingering in the 8KB rolling buffer don't pin a finished turn at PROCESSING. - COMPLETED vs IDLE latches on a "•" bullet (a turn produced output); the welcome banner / update nag have none, so a fresh terminal reads IDLE and avoids a premature-completion race when the first task is sent. Adds regression tests driven by real captured raw pipe-pane buffers (idle / processing / completed) of the new TUI. Also enables Claude <-> Kimi cross-provider e2e: - examples/cross-provider/data_analyst_kimi_cli.md (provider: kimi_cli). - TestCrossProviderClaudeToKimi / TestCrossProviderKimiToClaude. - Fix the cross-provider test helper to resolve the worker's provider from its profile (the add-terminal endpoint treats an explicit provider param as authoritative, which silently suppressed the profile override). Verified on real CLIs (kimi 1.47.0 + claude 2.1.x) against a cao-server built from this branch: - TestKimiCliHandoff (2) + TestKimiCliAssign (3): pass. - Claude->Kimi and Kimi->Claude cross-provider assign: pass. Full unit suite green (2202 passed); mypy / black / isort clean.

Copilot

Pull request overview

Copilot reviewed 75 out of 76 changed files in this pull request and generated 8 comments.

+async def wait_for_shell(
+    terminal_id: str,
    timeout: float = 10.0,
-    polling_interval: float = 0.5,
+    stable_duration: float = 2.0,
+    polling_interval: float = 0.3,
 ) -> bool:
-    """Wait for shell to be ready by checking if output is stable (2 consecutive reads are the same and non-empty)."""
-    logger.info(f"Waiting for shell to be ready in {session_name}:{window_name}...")
-    start_time = time.time()
-    previous_output = None
+    """Wait for shell to be ready by checking if the output buffer is stable and non-empty.

-    while time.time() - start_time < timeout:
-        output = backend.get_history(session_name, window_name)
+    Reads the StatusMonitor's in-memory buffer (populated by the FIFO reader
+    → event bus → StatusMonitor pipeline). Returns True when the buffer is
+    non-empty and has not changed for *stable_duration* seconds.
+
+    This does NOT use provider-specific status detection because the provider
+    is already registered before initialize() runs, and provider patterns
+    don't match raw shell output.
+    """
+    from cli_agent_orchestrator.services.status_monitor import status_monitor
+
+    logger.info(f"Waiting for shell to be ready for terminal {terminal_id}...")
+
+    deadline = time.time() + timeout


+async def wait_until_status(
+    terminal_id: str,
    target_status: "TerminalStatus | set[TerminalStatus]",
    timeout: float = 30.0,
    polling_interval: float = 1.0,
 ) -> bool:
-    """Wait until provider reaches target status or timeout."""
-    targets = target_status if isinstance(target_status, set) else {target_status}
-    start_time = time.time()
+    """Wait until terminal reaches target status by polling status_monitor."""
+    from cli_agent_orchestrator.services.status_monitor import status_monitor

-    while time.time() - start_time < timeout:
-        status = provider_instance.get_status()
-        target_str = ", ".join(s.value for s in targets)
-        logger.info(f"Waiting for {{{target_str}}}, current status: {status}")
-        if status in targets:
+    targets = target_status if isinstance(target_status, set) else {target_status}
+    target_str = ", ".join(s.value for s in targets)
+    logger.info(
+        f"wait_until_status [{terminal_id}]: waiting for {{{target_str}}}, timeout={timeout}s"
+    )
+    start = time.time()
+    while time.time() - start < timeout:
+        current = status_monitor.get_status(terminal_id)
+        if current in targets:
+            logger.info(f"wait_until_status [{terminal_id}]: reached {current.value}")
            return True
-        time.sleep(polling_interval)
-
+        await asyncio.sleep(polling_interval)
+    logger.warning(f"wait_until_status [{terminal_id}]: timeout waiting for {{{target_str}}}")
    return False


+            old_terminals = (
+                db.query(TerminalModel).filter(TerminalModel.last_active < cutoff_date).all()
+            )
+            for terminal in old_terminals:
+                fifo_manager.stop_reader(terminal.id)
+                status_monitor.clear_terminal(terminal.id)


+    def set_loop(self, loop: asyncio.AbstractEventLoop) -> None:
+        """Register the asyncio event loop (required for thread-safe publishing)."""
+        self._loop = loop
+
+    def publish(self, topic: str, data: dict) -> None:
+        """Publish event to all matching subscribers. Safe to call from any thread."""
+        if self._loop:
+            self._loop.call_soon_threadsafe(self._dispatch, topic, data)
+


+import logging
+from typing import Dict
+
+from cli_agent_orchestrator.constants import STATE_BUFFER_MAX
+from cli_agent_orchestrator.models.terminal import TerminalStatus
+from cli_agent_orchestrator.providers.manager import provider_manager
+from cli_agent_orchestrator.services.event_bus import bus
+from cli_agent_orchestrator.utils.event import terminal_id_from_topic
+
+logger = logging.getLogger(__name__)
+
+
+class StatusMonitor:
+    """Accumulates terminal output into rolling buffers and detects status changes."""
+
+    def __init__(self):
+        self._buffers: Dict[str, str] = {}
+        self._last_status: Dict[str, TerminalStatus] = {}
+


+    def _process_chunk(self, terminal_id: str, chunk: str) -> None:
+        """Append chunk to rolling buffer and check for status changes."""
+        if terminal_id not in self._buffers:
+            self._buffers[terminal_id] = ""
+        self._buffers[terminal_id] += chunk
+
+        if len(self._buffers[terminal_id]) > STATE_BUFFER_MAX:
+            self._buffers[terminal_id] = self._buffers[terminal_id][-STATE_BUFFER_MAX:]
+
+        new_status = self._detect_status(terminal_id, self._buffers[terminal_id])
+
+        if new_status != self._last_status.get(terminal_id):
+            bus.publish(f"terminal.{terminal_id}.status", {"status": new_status.value})
+            logger.info(f"Terminal {terminal_id} status changed: {new_status.value}")
+            self._last_status[terminal_id] = new_status
+


+    def clear_terminal(self, terminal_id: str) -> None:
+        """Free buffer and status for a deleted terminal."""
+        self._buffers.pop(terminal_id, None)
+        self._last_status.pop(terminal_id, None)
+
+    def reset_buffer(self, terminal_id: str) -> None:
+        """Clear the rolling buffer + last-known status WITHOUT forgetting the
+        terminal.
+
+        Used when a provider relaunches a different CLI mode on the SAME
+        ``terminal_id`` (e.g. Kiro's TUI -> ``--legacy-ui`` fallback). Without
+        this, the retry re-derives status from a buffer still full of stale bytes
+        from the failed first attempt and can spuriously time out.
+        """
+        self._buffers[terminal_id] = ""
+        self._last_status.pop(terminal_id, None)
+
+    def get_status(self, terminal_id: str) -> TerminalStatus:
+        """Get current terminal status. Source of truth — derived from streaming output."""
+        return self._last_status.get(terminal_id, TerminalStatus.UNKNOWN)
+
+    def get_buffer(self, terminal_id: str) -> str:
+        """Get accumulated output buffer for a terminal."""
+        return self._buffers.get(terminal_id, "")


+    @staticmethod
+    def _write(path, data: str) -> None:
+        with open(path, "a") as f:
+            f.write(data)


…n herdr The event-driven rewrite made the FIFO -> EventBus -> StatusMonitor pipeline the sole source of shell-readiness and terminal status. The herdr backend deliberately skips that pipeline (its pipe_pane is a no-op; no FIFO reader is started for it, since it delivers via socket events), so for a herdr terminal the StatusMonitor buffer stayed "" and status stayed UNKNOWN forever. As a result provider.initialize() timed out in wait_for_shell()/wait_until_status(), and every status read (API GET /terminals/{id}, busy checks, provider init loops) returned UNKNOWN -- terminal creation was broken for essentially every provider on herdr. Fix at the single chokepoint instead of per-call-site: - status_monitor.get_status() is now backend-aware: for event-inbox backends it derives status on demand from the provider's get_status() (which consults backend.get_native_status()). This fixes all read sites at once -- API status, wait_until_status(), flow busy checks, curator liveness, and the copilot/gemini init loops -- without each having to special-case the backend. - wait_for_shell() reads backend.get_history() directly for event-inbox backends rather than the never-populated StatusMonitor buffer. - wait_until_status() reverts to simply polling the now backend-aware get_status(), removing the duplicated backend logic. The tmux path is unchanged (supports_event_inbox() is False -> same pushed-status read as before). Verified end-to-end against a real herdr 0.6.8 server: a live Claude Code agent launched in a herdr pane, reached idle, and completed a task. Adds test/services/test_status_monitor.py and herdr cases to test/utils/test_terminal.py.

call-me-ram · 2026-06-09T06:21:43Z

Addressed: herdr backend terminal-init hang (Copilot review — `utils/terminal.py`)

Copilot flagged that wait_for_shell() / wait_until_status() consult only the FIFO-driven StatusMonitor, which the herdr backend never feeds — so provider.initialize() would time out on herdr. Confirmed and fixed in 9caea9a.

Root cause (single source of truth wasn't backend-aware). The event-driven rewrite made the FIFO → EventBus → StatusMonitor pipeline the sole source of shell-readiness and status. herdr deliberately skips that pipeline (no-op pipe_pane, no FIFO reader — it delivers via socket events), so for a herdr terminal the StatusMonitor buffer stayed "" and status stayed UNKNOWN forever. This broke initialize() and every status read (GET /terminals/{id}, busy checks, the copilot_cli/gemini_cli init loops), so terminal creation failed for essentially every provider on herdr.

Fix — at the chokepoint, not per-call-site:

status_monitor.get_status() is now backend-aware: for event-inbox backends it derives status on demand from the provider's get_status() (which consults backend.get_native_status()). One change fixes all status read sites.
wait_for_shell() reads backend.get_history() directly for event-inbox backends instead of the never-populated buffer.
wait_until_status() reverts to polling the now backend-aware get_status() (no duplicated backend logic).

tmux path is unchanged (supports_event_inbox() is False).

Verified end-to-end against a real herdr 0.6.8 server — a live Claude Code agent launched in a herdr pane, reached idle, and completed a task:

Shell ready for 643029c9 (buffer stable, 236 bytes)
wait_until_status [643029c9]: reached idle
Created terminal: 643029c9 in session: cao-b127e8de
# task "6 * 7" -> agent replied: 42

Adds test/services/test_status_monitor.py and herdr cases in test/utils/test_terminal.py; full suite green.

Note: the other except Exception "swallows CancelledError" items in the review are not reproducible — asyncio.CancelledError derives from BaseException, not Exception, so those loops cancel cleanly on shutdown.

anilkmr-a2z

Review — Event-Driven Architecture + Herdr Integration Check

Mostly lgtm. Reviewed the full diff with focus on herdr backend compatibility and past issues (pane resolution, stale events, inbox delivery). Herdr integration is solid — the supports_event_inbox() gate correctly routes herdr through native status derivation, skips FIFO/pipe-pane, and preserves the socket-event inbox wiring.

Summary of findings (details in inline comments)

API contract break: TerminalStatus.UNKNOWN replaces ERROR as the fallback — downstream clients may not handle it
Herdr delivery callback: synchronous deliver_pending in async context — low priority but worth noting
BaseProvider docstring: says implementations "should" call strip_terminal_escapes but kiro intentionally doesn't — "MAY" is more accurate
EventBus queue overflow: acceptable with reconciliation sweep as safety net
Missing test: no unit test for herdr branch in status_monitor.get_status()
Dead code: IDLE_PROMPT_PATTERN_LOG constants remain in kiro_cli and claude_code after get_idle_pattern_for_log() removal

anilkmr-a2z · 2026-06-09T19:17:30Z

 class TerminalStatus(str, Enum):
    """Terminal status enumeration with provider-aware states."""

+    UNKNOWN = "unknown"


API contract break: ERROR -> UNKNOWN

This changes the public contract: GET /terminals/{id} previously returned "error" when no output or no pattern matched; now it returns "unknown". This is arguably more correct (no output != error), but it is a breaking change for any downstream client (web UI, flow scripts, external integrations) that checks status == "error" as their catch-all.

Worth calling out explicitly in the PR description that this is an intentional API change. Downstream consumers need to handle "unknown" -- or at minimum confirm the web UI and flow scripts do not branch on "error" for the "not yet producing output" case.

anilkmr-a2z · 2026-06-09T19:17:30Z

+        if get_backend().supports_event_inbox():
+            try:
+                provider = provider_manager.get_provider(terminal_id)
+            except Exception:


Missing test for herdr branch

This if get_backend().supports_event_inbox() branch is the critical path for herdr status queries. I do not see a dedicated unit test that exercises it (terminal_service tests mock the backend but status_monitor tests do not). A targeted test with supports_event_inbox=True + a mock provider returning a known status would pin this contract.

anilkmr-a2z · 2026-06-09T19:17:30Z

+        stream (cursor-positioning escapes, in-place ``\\r`` redraws, OSC titles),
+        NOT a tmux-rendered pane snapshot. Implementations that do structural /
+        line-oriented matching should run it through
+        ``cli_agent_orchestrator.utils.text.strip_terminal_escapes`` first


Docstring says should but kiro intentionally does not

The input contract says implementations that do structural/line-oriented matching should run strip_terminal_escapes. Kiro's get_status intentionally preserves
for permission prompt detection (and has a comment saying Do not switch this to strip_terminal_escapes). Consider softening to MAY so a future contributor does not fix kiro to comply with this docstring and break permission detection.

anilkmr-a2z · 2026-06-09T19:17:30Z

    cleanup_expired_memories,
    cleanup_old_data,
 )
+from cli_agent_orchestrator.services.event_bus import bus


Herdr delivery callback runs synchronous subprocess in async context

The deliver_inbox callback (used by HerdrInboxService) calls inbox_service.deliver_pending() synchronously. That method now also calls status_monitor.get_status() which for herdr invokes provider.get_status() -> get_native_status() (subprocess). Low priority since the reconciliation sweep catches misses, but for parity with the InboxService event-bus consumer (which wraps in asyncio.to_thread), this could benefit from the same treatment.

The event-driven status pipeline strips terminal escapes from the raw 8KB FIFO buffer to feed provider.get_status(). _LINE_START_CSI in strip_terminal_escapes already turned CHA (\x1b[1G) and CNL (\x1b[E) into \n so per-line patterns work, but missed CUP — Cursor Position to column 1 (\x1b[<row>;1H). Codex's TUI lays out its bottom prompt + status bar via CUP rather than CHA: \x1b[46;1H› places the idle ``›`` at column 1 of row 46. Without normalising this, the ``›`` glyph stayed glued mid-stream, e.g. ›Improve documentation in @filenameopenai.gpt-5.5 medium · /tmp/... The per-line check at codex.py:get_status:370 only inspects the bottom five lines for ``\s*(?:❯|›|codex>)``, so the prompt was never detected and codex sessions reported PROCESSING forever — every codex e2e test hit "Codex initialization timed out after 60 seconds" (0/12 passing). Extend _LINE_START_CSI to also match \x1b[\d+;1H. After the fix the ›-prefixed idle prompt sits on its own line, the existing IDLE_PROMPT_PATTERN check matches, and codex idle detection works: in replay against captured FIFO logs, 14/15 codex sessions reach idle correctly. Remaining e2e failures are MCP startup >60s and OpenAI ``stream disconnected before completion`` API errors — outside the scope of status detection. Add a regression test covering both \x1b[46;1H› (codex's bottom prompt) and the bare \x1b[1;1Hb form.

…on event-driven pipeline Under tmux capture-pane, a TUI that redraws over its boot screen hides "Initializing..." and the MCP-init line — Kiro's TUI -> --legacy-ui fallback path also calls StatusMonitor.reset_buffer when retrying so the rolling buffer stays clean. Yolo mode does NOT take that fallback path: it forces --legacy-ui directly at launch, so its rolling FIFO buffer keeps the boot bytes forever even after kiro has redrawn over them and the actual interactive prompt is showing below. Until now get_status returned PROCESSING unconditionally whenever TUI_INITIALIZING_PATTERN matched anywhere in the buffer. Yolo sessions therefore reported PROCESSING for the entire session and wait_until_status({IDLE, COMPLETED}) timed out — kiro yolo e2e went from passing to 0/11 under the new event-driven pipeline (awslabs#273). Make Check 0 position-aware: only return PROCESSING from a matching TUI_INITIALIZING_PATTERN when no real ``[agent] >`` idle prompt appears AFTER the last init match. The new-TUI placeholder ("Ask a question or describe a task") is intentionally NOT counted as a real idle prompt because the new TUI renders it during boot; that pre-existing test (test_tui_initializing_yields_processing_despite_idle_placeholder, issue awslabs#211) still passes. Update test_mcp_server_init_yields_processing — its old fixture asserted boot-line + "[developer] !>" should yield PROCESSING, but real captured Kiro logs show the [developer] prompt only renders AFTER init completes, so the assumption was incorrect. Replace the fixture with the actual placeholder text Kiro shows during boot, and add test_mcp_server_init_with_post_init_prompt_yields_idle to lock in the post-init redrawn-stale case (the yolo failure mode). Verified end-to-end: kiro e2e (yolo, --legacy-ui) 0/11 -> 11/11.

… flap PR awslabs#273's event-driven pipeline derives status from a rolling 8KB FIFO buffer fed to per-provider get_status(). TUI redraws (status-bar refreshes, cursor positioning, footer repaints) keep emitting bytes for seconds AFTER the agent settles, eventually evicting the idle/response markers from the 8KB window. Without latching, status flaps rapidly between IDLE/COMPLETED and PROCESSING/UNKNOWN, and both wait_until_status (server-side) and the e2e tests' HTTP polling miss the brief ready windows — causing codex 60s init timeouts, gemini 240s init timeouts, and completion-timeout failures. StatusMonitor now refuses two downgrades once a ready status latches: - ready -> PROCESSING/UNKNOWN (typical buffer-eviction flap) - COMPLETED -> IDLE (codex's user-marker evicts before assistant's, so last_user is None and provider falls back to IDLE, silently overwriting COMPLETED) notify_input_sent() arms a one-shot revert gate so the latch releases when external input legitimately starts a new processing cycle: - terminal_service.send_input / send_special_key (runtime input) - each tested provider's initialize() before its launch keystrokes and bypass/trust prompt acknowledgements codex get_status now also handles the "long response evicted the user marker" case directly: when last_user is None, scan above the TUI footer for an assistant bullet and return COMPLETED instead of IDLE (was returning IDLE forever, which the COMPLETED->IDLE block above would otherwise cement as an off-by-one defence). gemini_cli now declares extraction_tail_lines = 5000 so its existing extraction_retries (3 x 10s) actually fires. The escalating-fetch path (200/500/1000/5000) ran each step once with no inter-step waits, so Ink-TUI redraws still rendering at extraction time fell through to [PARTIAL RESPONSE], leaving the TUI footer in the output and tripping the "? for shortcuts not in output" assertion. E2E results on PR awslabs#273 with these changes: codex 10/12 passing (1 pre-existing extraction issue, 1 xfail) claude 12/12 gemini 12/12 kiro 11/11

…e thread-safe Two hardening fixes to the sticky ready-status latch (PR #1): 1. Arm semantics: notify_input_sent()'s one-shot arm was consumed by ANY new ready latch, including a ready→ready downgrade flap (COMPLETED→IDLE when a large paste evicts the response markers, WAITING_USER_ANSWER→IDLE after a permission keystroke). Consuming the arm there blocks the genuine PROCESSING that follows, so the terminal reads ready while the agent is busy — and InboxService delivers on IDLE/COMPLETED, so a queued message could be pasted mid-response. The arm is now consumed only by a PROCESSING transition or a genuine non-ready→ready (init-style) latch. 2. Thread safety: the latch decision is a read-modify-write sequence (read armed → decide transition → consume arm) executed on the asyncio consumer, while notify_input_sent()/get_status()/clear_terminal() run on FastAPI threadpool, inbox delivery workers, and the cleanup thread. Guard StatusMonitor state with a lock (provider regex analysis and bus.publish stay outside it). Also covers the pre-existing Copilot review comments about unsynchronized _buffers/_last_status access. Adds a latching state-machine test suite (12 cases) pinning blocked downgrades, arm consumption, flap survival, and event publication.

Real failure from the supervisor-handoff e2e: mid-turn, Claude shows an interim completion summary from an earlier thinking phase, then keeps working — '● Calling cao-mcp-server… (ctrl+o to expand)' with a live '✢ Misting… (33s · ↑ 332 tokens)' spinner and a '⎿ Tip: …' hint line between the spinner and the input box. Two gaps combined into a false COMPLETED, which the StatusMonitor ready-latch then pinned until the next input, so the e2e test extracted mid-flight output: - _boxless_completion_tail treated ANY completion summary after the last separator as a finished turn. It now requires that no live spinner (glyph + gerund + ellipsis) renders after the summary — an interim summary followed by a fresh spinner keeps the turn PROCESSING. - The new-TUI box-spinner check looked only at the single line directly above the input box; '⎿ Tip:' hint lines and blanks render between the live spinner and the box, hiding it. The check now walks up to 4 lines, skipping ⎿ continuation lines and blanks. Regression tests reproduce both shapes (fail pre-fix) plus a guard that a summary with no following spinner still reads COMPLETED.

The newest 'Kimi Code' TUI renders user messages as ✨-prefixed prompt lines (no ╭─ input box), responses as • bullets, and a '── input ──' rule plus status bar as footer. extract_last_message_from_script() preferred the last ╰─ box-end as the response anchor — which in the new TUI matches decorative boot banners (Kimi welcome box, FastMCP server banner) ABOVE the conversation — and only stopped at a bare idle prompt, which the new TUI never renders. Result: the 'response' was sliced from the boot screen and ran to end-of-capture (raw spinner frames, status bar), failing the supervisor-assign e2e. - Anchor on the LATEST marker (box-end vs prompt-with-input) instead of box-first: the response always follows the last user input, whichever marker style rendered it. - Stop the response at the new-TUI footer: the '── input ──' rule or the status-bar/context-footer line, in addition to the bare idle prompt. Fixtures are ground truth from a live Kimi Code session.

- LogWriter: open log files with explicit encoding='utf-8', errors='replace' — a non-UTF-8 platform locale (POSIX/C) would raise UnicodeEncodeError on the first unencodable chunk and stop log persistence for that terminal. - EventBus.set_loop: accept Optional[loop] (test fixtures already call set_loop(None)); publish() now guards against the loop being closed during shutdown instead of raising RuntimeError from a FIFO reader thread draining its last chunks. - BaseProvider.get_status docstring: soften the strip_terminal_escapes 'should' to MAY with an explicit note that kiro_cli intentionally preserves raw \r for permission-prompt detection and must not be 'fixed' to comply. The 'except Exception swallows CancelledError' comments are intentionally NOT addressed: since Python 3.8, asyncio.CancelledError derives from BaseException, so those loops do not swallow cancellation.

Kimi (v1.20+) launches via 'cd <temp_dir> && kimi --agent-file <temp_dir>/agent.yaml', writing the system prompt — including the injected skill catalog — to <temp_dir>/system.md. The catalog is no longer in the CLI command string, and the full-screen Kimi Code TUI clears the visible screen on startup, so asserting against raw tmux scrollback always failed. Parse the temp dir from the launch command (capture-pane -J so soft-wrapped lines can't split the path mid-token) and assert against system.md on disk — same pattern as the existing gemini GEMINI.md branch.

…de TUI The newest TUI renders the live spinner BETWEEN the '── input ──' rule and the status bar, and repaints the status bar with every frame — so the spinner-vs-ready-chrome position compare read 'ready' mid-turn whenever a full footer repaint was the freshest chunk (measured: one supervisor turn flapped completed↔processing 29 times in a 57KB stream). The StatusMonitor ready-latch then pinned the first false COMPLETED ~130ms after dispatch, so supervisor flows extracted mid-flight output (supervisor-assign e2e). Detection now uses signals validated by replaying captured live streams: - Live-spinner lines: braille/moon glyphs (incl. tool-call lines like '⠹ Using handoff({…})'), EXCLUDING boot chrome ('⠧ MCP Servers: 0/1', '⠋ Loading configuration...') which legitimately shows braille while idle at the welcome screen. - In flight when a live spinner is within the freshest 15 lines OR renders after the last '•' bullet (covers chunk boundaries where streamed thinking text temporarily pushed the spinner out of the tail). - Dispatch grace: 5s after send_input() (mark_input_received now stamps the dispatch), bridging the paste→first-spinner-frame gap. - Rendered-pane confirmation: a ready-looking chunk boundary mid-turn is byte-identical to one at real completion (stale spinner ~21 lines back, bullets 2-3 from the end in BOTH), so when the stream says ready post-dispatch, confirm against the rendered pane — tmux's compositor has resolved every in-place redraw, so a visible spinner is live, not stale. Also raises the e2e SUPERVISOR_COMPLETION_TIMEOUT 300→600s: a kimi worker alone takes ~3m20s on the report-template handoff; 300s only looked sufficient while the false early COMPLETED was masking the real duration. E2E (live agents): kimi supervisor handoff + assign_and_handoff now pass; kimi handoff x2, send_message, skills unaffected and green.

…or/NotebookEdit The allowed-tools e2e caught restricted agents escaping their restrictions live: a reviewer (no Bash/Write) created the forbidden file anyway — first 'via a delegated subagent that ran the write through a shell command' (Task), then on retry via 'the Monitor tool' (background shell scripts). CAO computes --disallowedTools as (tool universe - allowed natives), and claude_code's universe only contained Bash/Read/Edit/Write/Glob/Grep — so the execution-capable tools outside it were never disallowed. Gate them by privilege equivalence: - execute_bash now maps Bash, BashOutput, KillShell, Task (subagents run with their own full toolset), and Monitor (runs arbitrary shell scripts). Anything these can do, Bash can too, so profiles allowed execute_bash lose nothing. - fs_write now maps NotebookEdit alongside Edit/Write (.ipynb writes). Known limitation: this is enumeration against an evolving toolset — each new execution-capable Claude Code tool needs adding here. A deny-by-default mechanism upstream would be the durable fix. E2E (live, herdr backend): all 4 TestClaudeCodeAllowedTools pass, including the two that previously demonstrated the escape.

anilkmr-a2z

All review concerns addressed in the latest commits:

BaseProvider docstring: changed "should" to "MAY" with explicit kiro carve-out
StatusMonitor tests: TestStickyLatching class covers the herdr and tmux state machine paths
Thread safety: RLock added to StatusMonitor
EventBus: set_loop accepts None, publish handles closed-loop RuntimeError gracefully
LogWriter: explicit UTF-8 encoding

The UNKNOWN vs ERROR API contract change is the only item not explicitly called out in PR description -- worth a one-liner there for changelog purposes but not blocking.

Herdr integration verified intact across all new commits. Ship it.

Copilot

Pull request overview

Copilot reviewed 80 out of 81 changed files in this pull request and generated 3 comments.

@haofeif

* feat(providers): add Cursor CLI as a first-class provider Adds support for the Cursor CLI (agent, https://cursor.com/cli) so it can be orchestrated alongside Claude Code, Kiro CLI, Codex, and the other providers already supported by CAO. Resolves issue #264. The provider is built on the post-event-driven-architecture (post-#273) provider API: async initialize(), get_status(output) that takes the StatusMonitor buffer string directly, and get_backend() for tmux I/O. What the provider does: - Launches the interactive 'agent' REPL (the primary command per Cursor's official docs; cursor-agent is the historical alias and resolves to the same binary). --print is intentionally NOT used so the inbox service can stream follow-up prompts via MCP handoff. - Forwards the agent profile's system prompt via --system-prompt (newlines escaped for tmux compatibility), with the skill catalog appended. - Forwards profile.mcpServers via --mcp <json>, injecting CAO_TERMINAL_ID into each server's env so MCP tools can identify the current terminal. - Honors profile.model via --model (overridable via the constructor). - Bypasses the per-tool approval dialog with --force, the workspace-trust dialog with --trust, and the per-MCP-server approval dialog with --approve-mcps, so worker agents spawned via handoff/assign do not block. - Soft tool-restriction enforcement via SECURITY_PROMPT prepended to the system prompt when allowedTools is set (Cursor CLI does not yet expose a --disallowedTools equivalent). Status detection (mirrors the Claude Code provider's robust pattern): - Structural spinner-before-separator check for PROCESSING. - Fallback position-based spinner check before the first separator. - Idle / trust / permission prompts distinguished by pattern priority. - Message extraction uses the structural separator + trailing prompt pattern, since Cursor CLI does not emit a single canonical response marker like Claude Code's \u23fa. Files: - src/cli_agent_orchestrator/providers/cursor_cli.py: new CursorCliProvider (BaseProvider implementation, post-#273 API). - src/cli_agent_orchestrator/providers/manager.py: register cursor_cli. - src/cli_agent_orchestrator/models/provider.py: add CURSOR_CLI to enum. - src/cli_agent_orchestrator/cli/commands/launch.py: add cursor_cli to PROVIDERS_REQUIRING_WORKSPACE_ACCESS. - test/providers/test_cursor_cli_unit.py: 53 unit tests (regex patterns, get_status, extract, build command, async initialize, lifecycle, manager registration, workspace access). - test/providers/fixtures/cursor_cli_*.txt: 4 status fixtures (idle, completed, processing, permission). - docs/cursor-cli.md: full provider documentation. - README.md: new row in the provider table; quickstart and cross-provider sections updated to include cursor_cli. Quality gate: - black / isort: clean - mypy on new/changed files: 0 errors - pytest: 2372 passed, 0 failed (full test suite minus e2e/integration) - pytest test/providers/test_cursor_cli_unit.py: 53 passed - pytest test/providers/test_provider_manager_unit.py: 15 passed (no regression) Co-authored-by: kiloconnect[bot] <240665456+kiloconnect[bot]@users.noreply.github.com> * test(cursor_cli): cover e2e examples/assign + dedupe extraction branch Addresses review comments on PR #296. Codecov (98.62%, 2 missing lines): - Removed the unreachable 'Incomplete Cursor CLI response - no separator before idle prompt' branch in extract_last_message_from_script. The early check 'if not separators or not idle_matches' already rejects the only case where this branch could fire (no separator at all before the trailing prompt). Replaced with an explicit assert end_sep is not None plus a comment explaining the invariant. Cover is now 100%. haofeif (verifies examples/assign e2e test passes with cursor_cli): - Added require_cursor fixture in test/e2e/conftest.py (matches the require_kimi / require_copilot pattern; checks both 'agent' and the legacy 'cursor-agent' alias). - Added e2e test classes for every flow: * TestCursorCliAssign (3 tests: data_analyst, report_generator, assign_with_callback) * TestCursorCliHandoff (2 tests: simple_function, second_task) * TestCursorCliSendMessage (1 test) * TestCursorCliAllowedTools (3 tests; restricted_supervisor marked xfail because cursor_cli uses soft enforcement via SECURITY_PROMPT — no native --disallowedTools equivalent) * TestCursorCliSupervisorOrchestration (3 tests including assign_three_analysts, the canonical examples/assign smoke test) - Added cursor_cli to the launch examples in examples/assign/README.md so users can run 'cao launch --agents analysis_supervisor --provider cursor_cli'. New unit test: - test_extracts_with_only_one_separator covers the start_sep=None fallback path in extract_last_message_from_script (one-separator buffers where the response-start separator has scrolled out of the 8KB rolling window). Files: examples/assign/README.md, src/cli_agent_orchestrator/providers/cursor_cli.py, test/e2e/{conftest,test_assign,test_handoff, test_send_message,test_allowed_tools, test_supervisor_orchestration}.py, test/providers/test_cursor_cli_unit.py Quality gate: - black / isort: clean (238 files) - mypy on new/changed files: 0 errors - pytest test/ --ignore=test/e2e -m 'not integration': 2373 passed, 0 failed (was 2372 before this commit) - pytest test/providers/test_cursor_cli_unit.py: 54 passed - pytest test/e2e/ --collect-only -m e2e: 80 tests collected (5 new cursor_cli test classes included) Co-authored-by: kiloconnect[bot] <240665456+kiloconnect[bot]@users.noreply.github.com> * docs(cursor_cli): add e2e prerequisites, 11-test matrix, and smoke-test guide Исправлено с кило минимакс м3. Addresses the @haofeif review comment requesting that the new provider be validated with the examples/assign e2e workflow. The e2e test classes themselves were added in commit 8384c24; this commit expands docs/cursor-cli.md to give maintainers a complete how-to-run guide. The End-to-End Testing section now documents: - **Prerequisites**: Cursor CLI binary, CAO server, the three examples/assign/ profiles installed for the cursor_cli provider (with the --provider cursor_cli flag), and tmux. - **Pytest invocation**: the exact uv run pytest -m e2e ... -k cursor_cli -o "addopts=" form, with per-file invocations for handoff / assign / send_message / allowed_tools / supervisor_orchestration. Notes that the default pytest addopts excludes the e2e marker and the override is required. - **The 11 core e2e tests** (per skills/cao-provider/references/ lessons-learnt.md lesson #20, which lists the 11 minimum-success tests per provider). Each row links the test class + method to what it validates, and explicitly notes that test_restricted_supervisor_cannot_bash is marked xfail (soft enforcement via SECURITY_PROMPT — documented limitation). - **Manual examples/assign/ smoke test** as a quick interactive validation outside pytest, with the 'supervisor must NOT do the work itself' invariant called out and a pointer to lessons #19 and #16 for common failure modes. - **Troubleshooting entry #6** for the 'E2E tests skip with Cursor CLI not installed' auto-skip path (require_cursor fixture behaviour). The doc structure follows docs/claude-code.md as the reference template; section ordering, heading levels, and code-block formatting all match. Files: docs/cursor-cli.md Co-authored-by: kiloconnect[bot] <240665456+kiloconnect[bot]@users.noreply.github.com> * fix(cursor_cli): strip full terminal escapes + restore q_cli/opencode_cli in README Исправлено с кило минимакс м3. Addresses the two Copilot review comments (submitted against commit 0f6f678 via the copilot-pull-request-reviewer[bot] account). **Copilot review #1 (src/cli_agent_orchestrator/providers/cursor_cli.py line 403 — extract_last_message_from_script escape handling):** Copilot flagged that the function operated on tmux 'capture-pane -e' output (escape sequences enabled) but only stripped SGR colour codes. Cursor CLI re-renders cursor-positioning and OSC sequences inside the response area, and these can leak into the extracted text or break the separator/prompt detection in get_status(). Fixes: - The separator regex in BOTH get_status() and extract_last_message_from_script() now tolerates any CSI sequence (not just SGR) interleaved between the box-drawing characters, using the strict ECMA-48 grammar (intermediate bytes 0x30-0x3F, final byte 0x40-0x7E). This prevents a stray 'ESC [' introducer from being consumed. - The response region in extract_last_message_from_script() is now stripped with a full-escape regex that handles CSI ('\x1b[...' final byte), OSC ('\x1b]...' BEL or ST), and 2-byte ESC sequences ('\x1b<intermediates><final>'). The docstring explicitly explains why we do NOT use the shared strip_terminal_escapes() helper (it normalises \r → \n, which would split single-line spinner frames into multiple lines — destructive for response extraction). **Copilot review #2 (README.md line 167 — Quick Start 'Valid:' list):** The Quick Start snippet's inline '# Valid:' provider list omitted 'q_cli' even though 'q_cli' is a supported provider everywhere else in the README and in ProviderType. The cross-provider section's bullet list was also missing 'opencode_cli'. Fixes: - README.md line 167: restored 'q_cli' to the inline list. - README.md line 256: added 'opencode_cli' to the cross-provider list (was already in the inline quickstart but missing here). **New unit tests:** - test_separator_matching_tolerates_interleaved_csi_escapes: exercises a separator line with SGR colour escapes between every box-drawing character (\x1b[38;5;245m────...\x1b[0m). Asserts both get_status() and extract_last_message_from_script() still find the boundary. - test_extraction_strips_cursor_positioning_sequences: injects \x1b[2K (erase line) and \x1b[H (cursor home) into the response region and verifies they are stripped from the result. - test_extraction_strips_osc_title_sequences: injects an OSC window-title update (\x1b]0;Cursor Agent\x07) and verifies it is stripped from the result. **Quality gate:** - black / isort: clean (238 files) - mypy src/cli_agent_orchestrator/providers/cursor_cli.py: Success: no issues found - pytest test/ --ignore=test/e2e -m 'not integration': 2376 passed, 0 failed (was 2373 before this commit) - pytest test/providers/test_cursor_cli_unit.py: 57 passed (was 54; +3 escape-handling tests) - All 7 commits in the branch force-pushed to feat/cursor-cli-provider on the fork Co-authored-by: kiloconnect[bot] <240665456+kiloconnect[bot]@users.noreply.github.com> * fix(cursor_cli): address 6 Copilot review comments on PR #296 Исправлено с кило минимакс м3. Addresses all six inline review comments from the copilot-pull-request-reviewer[bot] review submitted at 2026-06-15T07:45:57Z against commit c5d1523 (PR #296). **1. IDLE_PROMPT_PATTERN must be start-of-line anchored (review #3411781807):** The previous pattern [❯>][\s\xa0] would match the leading '❯ ' on echoed user input lines (e.g. '❯ Summarize…') or any '> ' inside response content. Since get_status() returns COMPLETED whenever *any* match exists, this could misclassify status and could also make extract_last_message_from_script() anchor on the wrong 'idle' prompt. Fix: pattern is now ^\s*(?:\x1b\[[0-9;]*m)*[❯>](?:\x1b\[[0-9;]*m)*[\s\xa0] (start-of-line, optional leading whitespace, optional SGR colour codes before the prompt char, optional SGR codes after, then the '❯' or '>' and a single whitespace). MULTILINE flag is set on the finditer call so '^' matches at every line start, not just the buffer start. Matches the claude_code provider's _SOL_IDLE_RE pattern. **2. IDLE_PROMPT_PATTERN_LOG must be start-of-line anchored (review #3411781846):** Same fix applied to the log-file variant so the pre-check is consistent with live status detection. The log variant omits the SGR code allowances (logs are plain text) but retains the ^\s* start-of-line anchor. **3. initialize() must arm the StatusMonitor stickiness gate (review #3411781865):** initialize() now calls status_monitor.notify_input_sent() before send_keys so the next PROCESSING transition isn't suppressed when a ready status was previously latched. The import is lazy to break a circular import: status_monitor imports provider_manager which imports cursor_cli. **4. _build_cursor_command() must fall back to cursor-agent (review #3411781886):** The build path now uses shutil.which() to prefer the primary 'agent' binary and fall back to the legacy 'cursor-agent' alias when only that one is installed. Raises ProviderError with an install-from-URL message when neither is on $PATH. The e2e require_cursor fixture in test/e2e/conftest.py accepts either name, so the launch now behaves consistently. **5+6. Separator regex must tolerate CSI *between* dashes (reviews #3411781900 + #3411781914):** The previous regex (?:\x1b\[[\x30-\x3F]*[\x40-\x7E])*\u2500{20,} only allowed CSI sequences *before* the entire dash run — not between the dashes. The new pattern is:: ^(?:\x1b\[[\x30-\x3F]*[\x40-\x7E])?(?:\u2500(?:\x1b\[[\x30-\x3F]*[\x40-\x7E])?){20,}$ This is a repeated unit (─ + optional CSI) 20+ times. The optional CSI at the front handles Cursor's initial SGR colour setup. Intermediate bytes are restricted to the ECMA-48 param range (0x30-0x3F) so a stray 'ESC [' introducer is not consumed. The pattern is anchored to a full line so a stray dash sequence inside response content is not matched. The MULTILINE flag is required on finditer so '^' and '$' match at every line start/end. **New unit tests (11 added, 68 total in test_cursor_cli_unit.py):** - TestRegexPatterns::test_idle_prompt_is_start_of_line_anchored - TestRegexPatterns::test_idle_prompt_rejects_arrow_in_response_content - TestRegexPatterns::test_idle_prompt_log_is_start_of_line_anchored - TestSeparatorPattern::test_matches_plain_separator - TestSeparatorPattern::test_matches_csi_before_dash_run - TestSeparatorPattern::test_matches_csi_between_dashes - TestSeparatorPattern::test_does_not_match_dash_sequence_inside_content - TestBuildCommandBinaryResolution::test_prefers_agent_when_both_available - TestBuildCommandBinaryResolution::test_falls_back_to_cursor_agent_when_agent_missing - TestBuildCommandBinaryResolution::test_raises_when_neither_binary_installed - TestInitialize::test_initialize_arms_stickiness_gate Also added a module-level autouse _stub_cursor_binary fixture that patches shutil.which('agent') to return /usr/bin/agent for every test, so existing tests don't have to opt in to the binary-resolution mock. The 3 new TestBuildCommandBinaryResolution tests override this fixture to test the legacy-alias fallback and the missing-both error path. **Quality gate:** - black / isort: clean (238 files) - mypy src/cli_agent_orchestrator/providers/cursor_cli.py: Success - pytest test/ --ignore=test/e2e -m 'not integration': 2387 passed, 0 failed (was 2376 before this commit; +11 new tests) - pytest test/providers/test_cursor_cli_unit.py: 68 passed (was 57) - All 6 Copilot review comments addressed in this commit. Co-authored-by: kiloconnect[bot] <240665456+kiloconnect[bot]@users.noreply.github.com> * fix(cursor_cli): drop --trust, expose cursor_cli in /agents/providers and UI Three small follow-ups uncovered by a real end-to-end test of the Cursor CLI provider on Cursor CLI v2026.06.15 (the version installed via `curl https://cursor.com/install | bash`): 1. `cursor_cli` was missing from the `provider_binaries` dict in `/agents/providers` (`api/main.py`), so the web UI's provider dropdown never advertised it as installed, even with the binary on PATH. Added `"cursor_cli": "agent"` to match the resolution logic in `_build_cursor_command`. 2. `FALLBACK_PROVIDERS` in `web/src/components/AgentPanel.tsx` (the list shown when the API doesn't return providers, e.g. before the server is queried) was also missing `cursor_cli`. Added it. 3. `--trust` is rejected by Cursor CLI v2026.06.15 in interactive REPL mode (`Error: --trust can only be used with --print/headless mode`) and caused the launch to fail with a 500. Dropped it: the CAO launch flow already confirms workspace trust, and the interactive REPL doesn't have a per-directory trust dialog that `--trust` would skip. `--force` is still passed so per-tool approvals don't block. All 68 unit tests in `test/providers/test_cursor_cli_unit.py` and the web UI tests still pass. * fix(cursor_cli): add v2026+ TUI-placeholder status detection (issue #299) Cursor CLI v2026.x runs as a full Ink/TUI in interactive mode. The `❯` prompt and the `─────` separator that older text-mode builds emitted into the pipe-pane buffer are now TUI widgets and never reach the FIFO; the regex suite in the original provider (matching those markers) returns UNKNOWN forever, and `wait_until_status(... IDLE, COMPLETED)` times out after 30 seconds — the same 500 the issue reports. The only stable plain-text signal the v2026 TUI emits is the input-box placeholder "Plan, search, build anything". Cursor REPLACES it with the user's text on submit and only redraws it once the response is fully delivered, so: * present in the tail of the rolling buffer → IDLE / COMPLETED * absent (replaced by the user's text) → PROCESSING This commit: * adds `TUI_PLACEHOLDER_PATTERN` and `TUI_STATUS_BAR_PATTERN` constants and consults the last 1KB of the cleaned buffer for the placeholder before falling through to the existing separator-based regex suite (which still classifies older text-mode Cursor builds correctly); * records a real v2026.06.15 idle fixture (cursor_cli_v2026_idle_output.txt) captured via tmux pipe-pane + cat, and a synthetic v2026 processing fixture (placeholder replaced by user text); * adds a `TestGetStatusV2026Tui` test class with 8 tests that cover the placeholder present/absent cases, the TUI TAIL WINDOW contract (long-response eviction does not flip back to IDLE), and the status-bar guard so a half-initialised TUI does not false-positive; * drops `--trust` from the launch command — v2026 rejects `--trust` in interactive REPL mode ("only works with --print/headless mode"), the CAO launch flow already confirms workspace trust, and the interactive REPL has no per-directory trust dialog for the flag to skip anyway. `--force` is still passed so per-tool approvals do not block. All 75 unit tests in test/providers/test_cursor_cli_unit.py pass (7 new ones). The v2026 fixtures and the placeholder detection are isolated from the original regex suite, so older text-mode builds keep being classified the same way as before. End-to-end validation of the full launch against v2026 is BLOCKED by a separate, deeper issue uncovered while running this patch: v2026 has no `--agent` flag, so the provider's command exits immediately with "error: unknown option '--agent'" before the TUI is ever rendered. Tracked separately. The TUI detection in this commit is correct in isolation and will be needed once the launch command is fixed. * fix(cursor_cli): rebuild launch command for Cursor CLI v2026 (issue #300) Cursor CLI v2026.06.15 dropped two flags the original provider relied on: * `--agent <name>` (rejected with "error: unknown option '--agent'") * `--mcp <json>` (rejected with "error: unknown option '--mcp'") It also changed the semantics of an existing flag: * `--system-prompt` now takes a *file path* rather than inline text ("Error: failed to read --system-prompt file: <text>" when given inline text) The v2026 equivalents / replacements are: * `--agent <name>` -> none. The CAO agent profile body is carried in the `--system-prompt` file instead, so multi-agent orchestration (handoff / assign) still works. * `--mcp <json>` -> `--plugin-dir <path>` pointing at a directory holding a Cursor plugin manifest. We synthesise the directory at build time, materialising the profile's mcpServers map into the manifest's `mcpServers` field and forwarding `CAO_TERMINAL_ID` so MCP tools can resolve the current terminal. `--approve-mcps` is still passed to skip per-server approval dialogs. * `--system-prompt` -> writes the prompt to a per-session file under `~/.aws/cli-agent-orchestrator/tmp/<tid>-system-prompt.md` and passes the path. All 75 unit tests pass. End-to-end validated on this Codespaces: $ curl -X POST .../sessions?provider=cursor_cli&agent_profile=developer HTTP 201 in 7.5s Status changes: unknown -> completed (TUI placeholder detection) $ curl -X POST .../sessions?provider=cursor_cli&agent_profile=data_analyst HTTP 201 in 7.6s Status: idle Both sessions render the v2026 TUI correctly and the StatusMonitor latches the placeholder-driven IDLE/COMPLETED state, so the TUI-detection patch from 9502dd1 (#299) and the launch-command rework in this commit are now both end-to-end functional. Closes #300. * fix(api): honor `*` wildcard in WS_ALLOWED_CLIENTS The WebSocket terminal viewer was rejecting browser connections from any IP that wasn't in the literal allowlist. Operators running cao-server inside a container (Codespaces / devcontainers / remote hosts) could pass `CAO_WS_ALLOWED_CLIENTS="*"` to mean "any client", but the check was an exact `in` comparison so the literal string `"*"` never matched a real client IP and the handler always closed the connection with code 4003. Treat a literal `*` entry in `WS_ALLOWED_CLIENTS` as a wildcard that disables the IP check, matching the same opt-in semantics operators expect for `CAO_ALLOWED_HOSTS`. Container / Codespaces setups that pass `CAO_WS_ALLOWED_CLIENTS="*"` will now accept WS connections from the browser without enumerating the tunnel IP ahead of time. Security note: the WebSocket endpoint exposes unauthenticated PTY access and is intended for localhost-only use; setting `CAO_WS_ALLOWED_CLIENTS="*"` together with `--host 0.0.0.0` on a host reachable from the open internet is a real risk and should be paired with a reverse proxy that enforces auth (the existing comment at the top of the WS handler still applies). * fix(api): enable uvicorn proxy_headers for WS over HTTPS tunnels Codespaces / devcontainers / reverse-proxy setups (anything that terminates TLS in front of cao-server and forwards plain HTTP) need uvicorn to honour X-Forwarded-Proto / X-Forwarded-For. Without `proxy_headers=True`, uvicorn sees the raw HTTP request and the browser's WSS upgrade through the HTTPS tunnel is rejected — the WebSocket terminal viewer closes immediately with no useful diagnostic on the client side. `forwarded_allow_ips="*"` trusts the X-Forwarded-* headers from any upstream. Combined with `CAO_ALLOWED_HOSTS="*"` and `CAO_WS_ALLOWED_CLIENTS="*"` (now wildcard-aware after the previous fix) this is the standard Codespaces setup. Security: the WS endpoint still exposes unauthenticated PTY access; operators fronting cao-server with a reverse proxy that enforces auth should narrow `forwarded_allow_ips` to the proxy's IP range instead of leaving it on the wildcard. * docs(codespaces): add Codespaces setup and troubleshooting guide Add docs/codespaces.md covering server start command, the four CAO_* env vars, port 9889 forwarding, local verification, and a 404 troubleshooting table. Link it from CONTRIBUTING.md and DEVELOPMENT.md. * fix(cursor_cli): drop --system-prompt entirely for v2026.06.15 Cursor CLI v2026.06.15's backend (https://agentn.global.api5.cursor.sh) rejects every request that carries a `--system-prompt <file>` payload with `[invalid_argument] unknown option '--system-prompt'`. The bug is reproducible regardless of file contents — a 3-character file triggers the same error as a 4.5KB system prompt. Cursor's own debug log at /tmp/cursor-agent-logs/session-*.log shows the ConnectError firing on the very first request and all retries failing the same way. Cursor's log also shows the TUI retrying the request 3 times before giving up, which is what was rendering as `Reconnecting (attempt N, Ns)` in the web UI. This commit removes the --system-prompt flag from the launch command entirely. Multi-turn inbox still works because: * the CAO role / system prompt reaches the agent through the cao-mcp-server MCP tool's handoff / assign payloads (on the wire, not via Cursor's launch arguments); * the agent still has the @cao-mcp-server tool set loaded via --plugin-dir, so assign / handoff / send_message all work; * soft tool-restriction enforcement (SECURITY_PROMPT) is no longer available on the launch line — documented in the docstring; needs a different enforcement path if a per-profile restricted mode is required. The _write_system_prompt_file helper is preserved (gated by `if False and ...`) so a future Cursor point release that fixes the backend can re-enable the flag with a single-line change. End-to-end validated in this Codespaces: $ curl -X POST .../sessions?provider=cursor_cli HTTP 201 in 8.5s status: unknown -> completed -> processing $ curl -X POST .../terminals/<id>/input?message=hi%20there HTTP 200 in 0.4s Cursor log: [nal_agent_retries] Request successful Cursor log: outcome=success TUI status bar: "Composer 2.5 Fast 6.4%" (response streaming) 74/74 unit tests pass (one fewer than before — the test_skill_prompt_appended test was redundant with the new test_agent_profile_loaded_but_not_passed_as_flag case). * fix(cursor_cli): use 'ctrl+c to stop' as v2026+ processing signal (issue #299 follow-up) The previous TUI detection landed in 9502dd1 used the input-box placeholder ("Plan, search, build anything" / "Add a follow-up") as the idle / processing signal. Live testing on v2026.06.15 showed that the placeholder is ALWAYS present regardless of agent state — it is the *input box's empty state*, not a "ready for next turn" indicator. The previous detector therefore classified every post-launch TUI frame as PROCESSING (placeholder absent in the 1KB tail after the user submits), and never transitioned back to COMPLETED once the agent finished a turn. The correct v2026+ signal is the "ctrl+c to stop" hint Cursor renders on the same line as the placeholder every frame the agent is actively working on a turn. The hint disappears once the response is fully delivered and the input box is back to the placeholder alone. The hint is rendered in the last few hundred bytes of every Cursor TUI frame, so the same 1KB TUI TAIL WINDOW the previous patch used is still the right scope. Updated get_status() so the primary v2026+ PROCESSING check is "ctrl+c to stop" present in the tail (replacing the previous "placeholder absent in the tail" heuristic). The IDLE / COMPLETED check no longer requires the placeholder in the tail — it is the *absence* of the processing indicator, paired with the status bar being visible, that signals a turn has finished. TUI_PLACEHOLDER_PATTERN now matches BOTH placeholder strings Cursor v2026 uses ("Plan, search, build anything" on a fresh launch, "Add a follow-up" after the first turn) so the fixture-based unit tests cover both conversation states. A new TUI_PROCESSING_INDICATOR_PATTERN ("ctrl+c to stop") is the new primary signal. Both are imported by the test module and covered by test_processing_indicator_pattern_documented. Live validation in this Codespaces (agent CLI v2026.06.15): $ curl -X POST .../sessions?provider=cursor_cli HTTP 201 in 7.5s status: completed (TUI idle, no indicator) $ curl -X POST .../terminals/<id>/input?message=hi%20again T+1s: processing (ctrl+c to stop indicator visible) T+3s: processing T+5s: processing T+8s: completed (indicator gone, response delivered) T+10s+: completed (stable, no false positives) 77/77 unit tests pass (added 2 tests: test_processing_indicator_in_tail_returns_processing and test_processing_indicator_pattern_documented, plus a new post-turn-idle fixture cursor_cli_v2026_post_turn_idle_output.txt that captures the live 'Add a follow-up' placeholder text and the absence of the 'ctrl+c to stop' indicator). * fix(cursor_cli): address haofeif review + clean up v2026 follow-ups Implements the five action items from the human review on #296 plus the OUTDATED Copilot review threads from the v2026 follow-up commits. Action items from haofeif's review: 1. Make forwarded_allow_ips configurable. Replaces the hard-coded forwarded_allow_ips="*" (which trusts X-Forwarded-* from any upstream) with a new TRUSTED_FORWARDER_IPS constant that defaults to ["127.0.0.1", "::1"] and is extended by CAO_FORWARDED_ALLOW_IPS (comma-separated). A literal "*" is still honoured as a disable-the-check opt-in (matches the existing CAO_WS_ALLOWED_CLIENTS="*" semantics), so Codespaces users with no other option get the same behaviour as before. The default now matches the conservative CAO_WS_ALLOWED_CLIENTS default and is safe for bare cao-server --host 127.0.0.1 deployments. 2. Run black. Re-formats cursor_cli.py, test_cursor_cli_unit.py, api/main.py, and constants.py to match the project's [tool.black] config (line-length 100, target-version py310). CI's black check should now pass. 3. Implement temp file cleanup. cleanup() now removes every per-session temp file the provider created: <CAO_TMP_DIR>/<tid>-system-prompt.md and <CAO_TMP_DIR>/<tid>-cursor-plugins/ (including the plugin.json manifest inside the plugin dir). The paths are tracked in self._tmp_paths as the helpers create them, and cleanup() walks the registry, calls shutil.rmtree on directories / Path.unlink on files, swallows transient OSError (logged at WARNING), and drains the registry so a second cleanup is a safe no-op. 4. Remove dead code. Drops the if False and profile is not None block that preserved the v2026-disabled --system-prompt injection path. The _write_system_prompt_file helper is still available for a future Cursor point release; the launch command does not call it. Also removed a duplicate get_status body that had been left over from a merge. 5. Address binary resolution. The provider now prefers the unambiguous cursor-agent alias first (only the Cursor CLI ships it) and falls back to the documented primary agent name. When agent is selected the provider runs an agent --version probe and validates the banner looks like "agent <4-digit-year>.<...>" - gpg-agent and other unrelated agent-named tools on the host PATH no longer get launched with Cursor-only flags. Failed probes / unknown banners raise ProviderError with a clear "uninstall or symlink to cursor-agent" message. Docs + module docstring: - The module docstring was describing flags the provider no longer uses (--system-prompt, --agent, --mcp, --trust). Rewritten to match the v2026 launch command, list the deliberately-omitted flags, and explain the rationale (issue #299 / #300). - docs/cursor-cli.md updated for status detection, permission bypass, agent profile integration, launch command, tool restrictions, and troubleshooting. Test coverage added (82/82 unit tests pass): - test_agent_validation_passes_for_cursor_binary - test_agent_validation_rejects_non_cursor_binary - test_agent_validation_handles_probe_timeout - test_cursor_agent_skips_validation - test_prefers_cursor_agent_when_both_available - test_cleanup_removes_tracked_tmp_paths Fixture fix: - cursor_cli_v2026_processing_output.txt had lost the actual \x1b (ESC) bytes before CSI sequences. Rebuilt with real escape bytes so the fixture exercises the same escape-stripping code path the live v2026 TUI produces. * fix(cursor_cli): split IDLE / COMPLETED on the turn counter The provider used to report COMPLETED for both a fresh spawn (no user input yet) and a finished turn (response delivered, ready for the next prompt). The supervisor inbox and the StatusMonitor's stickiness gate treat both as "ready" so functionally nothing broke, but the UI badge showed the wrong label right after Spawn Agent - the user-visible status was "completed" for a terminal that had not yet received a single message. The split: IDLE = fresh spawn, never received a turn. COMPLETED = at least one turn has been delivered, the agent is back to a non-processing state. Cursor CLI v2026's TUI looks the same in both states (placeholder visible, status bar visible, no "ctrl+c to stop" hint), so the buffer alone cannot distinguish them. The provider now tracks a turn counter that ``mark_input_received`` (the hook the terminal service calls after every ``send_input``) increments. ``get_status`` returns IDLE while the counter is zero and COMPLETED once at least one turn has been delivered. Why not invent a buffer signal? The placeholder text swaps from "Plan, search, build anything" (fresh launch) to "Add a follow-up" (after the first turn), so in principle the placeholder text is a discriminator. But by the time the placeholder has been swapped the first turn has already been delivered - the swap happens on the agent's first input, not on a fresh launch. Using the counter is robust and does not depend on a brittle TUI signal that could change in a future v2026 point release. Implementation: - ``__init__`` initialises ``self._turns: int = 0``. - New ``mark_input_received()`` override increments the counter; called by the terminal service on every input delivery. - The two IDLE / COMPLETED branches in ``get_status`` now return ``COMPLETED if self._turns > 0 else IDLE``. Test updates: - Every test that previously asserted COMPLETED now calls ``provider.mark_input_received()`` first to simulate the post-turn state. - New ``test_idle_fixture_without_input_returns_idle`` and ``test_v2026_idle_fixture_fresh_spawn_returns_idle`` assert the IDLE label for the fresh-spawn state on both legacy and v2026 fixtures. Live validation in this Codespaces: \$ curl -X POST .../sessions?provider=cursor_cli HTTP 201, 7.5s status: idle (T+0s) \$ curl -X POST .../terminals/<id>/input?message=hi status: processing (T+1s-5s) status: completed (T+8s+) \$ curl -X POST .../terminals/<id>/input?message=how are you status: processing (T+1s-8s) status: completed (T+12s+) 84/84 unit tests pass. * fix(tests): address CI failures for #296 - isort: fix import order in test_cursor_cli_unit.py - test_list_providers_all_installed: bump to 10 providers and assert cursor_cli - test_main_custom_host_port / test_main_extends_cors_for_custom_host_port: account for new proxy_headers/forwarded_allow_ips kwargs added to uvicorn.run in 4a82417 (uvicorn proxy_headers for WS over HTTPS tunnels, #149) Co-authored-by: kiloconnect[bot] <240665456+kiloconnect[bot]@users.noreply.github.com> --------- Co-authored-by: ThePlenkov <6381507+ThePlenkov@users.noreply.github.com> Co-authored-by: kiloconnect[bot] <240665456+kiloconnect[bot]@users.noreply.github.com> Co-authored-by: Kilo <kilo@local>

tuankn22 and others added 18 commits March 9, 2026 17:27

refactored but wait_for_* not working

65a9526

working

20636f3

merge from main

b9a80b1

fix merge conflicts

073e27f

update docs and tests

ed6866f

update docs

2a33933

Merge branch 'main' into feat/event-driven-messaging

eab6ac8

rebase and update docs and tests

1c3395a

Merge branch 'main' into feat/event-driven-messaging

7d8c6d9

update tests

605c015

clean up routine

4cc5494

formatting and update kimi_cli to event-driven

f8670b8

Merge branch 'main' into feat/event-driven-messaging

984a2a5

Merge branch 'main' into feat/event-driven-messaging

197059c

Merge branch 'main' into feat/event-driven-messaging

6b84277

fix kiro_cli answer marker pattern

c5920b3

Merge branch 'main' into feat/event-driven-messaging

10a37ce

haofeif requested review from Copilot and patricka3125 June 3, 2026 23:48

Copilot started reviewing on behalf of haofeif June 3, 2026 23:48 View session

haofeif added the enhancement New feature or request label Jun 3, 2026

haofeif requested a review from anilkmr-a2z June 3, 2026 23:52

Copilot AI reviewed Jun 3, 2026

View reviewed changes

call-me-ram added 3 commits June 4, 2026 17:28

call-me-ram added 2 commits June 8, 2026 02:37

haofeif requested a review from Copilot June 9, 2026 01:09

Copilot started reviewing on behalf of haofeif June 9, 2026 01:09 View session

Copilot AI reviewed Jun 9, 2026

View reviewed changes

anilkmr-a2z reviewed Jun 9, 2026

View reviewed changes

haofeif added 3 commits June 10, 2026 10:19

haofeif mentioned this pull request Jun 10, 2026

Fix/issue 115 kiro codex status detection call-me-ram/cli-agent-orchestrator#1

Merged

call-me-ram added 8 commits June 10, 2026 17:30

Merge PR #1: sticky ready-status latching + kiro/codex status fixes

2f7bd32

anilkmr-a2z approved these changes Jun 10, 2026

View reviewed changes

haofeif requested a review from Copilot June 10, 2026 23:29

Copilot started reviewing on behalf of haofeif June 10, 2026 23:29 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread src/cli_agent_orchestrator/services/terminal_service.py

Comment thread src/cli_agent_orchestrator/api/main.py

Comment thread src/cli_agent_orchestrator/cli/commands/flow.py

call-me-ram mentioned this pull request Jun 11, 2026

Status detection: render the TUI with a terminal emulator (pyte) instead of regex over the raw pipe-pane stream #287

Closed

haofeif merged commit 75e98ac into awslabs:main Jun 11, 2026
29 checks passed

call-me-ram mentioned this pull request Jun 12, 2026

[Feat] External orchestration driver: event-driven waits, worker governance, git-based result contract (implementation available) #291

Open

		### Status Monitor (`services/status_monitor.py`) — Publisher + Consumer

		Subscribes to `terminal.*.output`. Accumulates output into a rolling buffer (8KB) per terminal, detects status via the registered provider (or a generic shell prompt pattern before init), and publishes `terminal.{id}.status` on change. Also the source of truth for current terminal status.

Conversation

call-me-ram commented Jun 3, 2026

Summary

What it does

Reconciling with main

Testing

Uh oh!

codecov-commenter commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

haofeif commented Jun 3, 2026

Uh oh!

anilkmr-a2z commented Jun 4, 2026

Uh oh!

call-me-ram commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause confirmed

What changed

Verified ✅

Needs your check ⚠️ (I don't have these CLIs locally)

Looking ahead — a proposal to stop re-tuning get_status on every TUI redesign

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

call-me-ram commented Jun 9, 2026

Addressed: herdr backend terminal-init hang (Copilot review — utils/terminal.py)

Uh oh!

anilkmr-a2z left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review — Event-Driven Architecture + Herdr Integration Check

Summary of findings (details in inline comments)

Uh oh!

anilkmr-a2z Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

anilkmr-a2z Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

anilkmr-a2z Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

anilkmr-a2z Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

anilkmr-a2z left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Reconciling with `main`

codecov-commenter commented Jun 3, 2026 •

edited

Loading

call-me-ram commented Jun 8, 2026 •

edited

Loading

Looking ahead — a proposal to stop re-tuning `get_status` on every TUI redesign

Addressed: herdr backend terminal-init hang (Copilot review — `utils/terminal.py`)

anilkmr-a2z left a comment •

edited

Loading