Plan: Refactor tools/ directory structure#21
Conversation
Coverage Report
Generated by pytest-cov on |
| │ │ ├── sandbox.py # sandbox rendering helpers/interceptor support | ||
| │ │ ├── task.py # TaskRenderer | ||
| │ │ └── skill.py # SkillRenderer | ||
| │ └── command_interceptors/ |
There was a problem hiding this comment.
We might want to refactor this, so we have it like:
- tools/
- rendering/
- tools/
- command/
- __init__.py
- interceptors/
There was a problem hiding this comment.
We should include this into this refactor.
| ├── gate-check.py # unchanged initially | ||
| ├── sandbox-bootstrap.py # unchanged initially | ||
| ├── run-sweep.py # unchanged initially, may keep calling wrapper | ||
| ├── codecome.py # unchanged initially |
There was a problem hiding this comment.
We might want to unify run-agent.py + codecome.py on a single 'wrapper'..
| ├── script-to-asciinema.py # unchanged | ||
| ├── mock-llm-server.py # unchanged | ||
| ├── mock-llm-parity.py # unchanged | ||
| └── mock_llm_scripts/ # unchanged |
There was a problem hiding this comment.
We should rename mock_llm_scripts to mock-llm-scripts
Create tools/codecome/ package with six modules extracted from the 5,876-line run-agent.py monolith: - codecome/version.py — OpenCode version checks - codecome/config.py — model/prompt/color resolution (self-contained, no _colors import) - codecome/session.py — HTTP session & prompt helpers - codecome/graceful.py — phase completion checks & resume prompts - codecome/transcript.py — transcript path/open/close (thread-safe counter) Also: - Expose _chat_render_and_log / _chat_update_modeline_info as standalone functions so tests work without Textual installed - Fix _cache_invalidate_stale to remove entries for deleted files - Update tests to load codecome.* modules directly - Add test_session.py, test_show_model_table, test_cache_invalidate_stale Tests: 282 passed, 0 failed, 0 errors
Create tools/rendering/ with architecture classes — no renderers migrated yet. The package provides: - RenderSettings — 20+ display tunables from env vars - SnapshotCache — file content snapshot for write/edit diffs - RenderSink — protocol + PlainSink, RichConsoleSink, TextualRichLogSink - RenderContext — root, sink, settings, cache bundle - EventRenderer — base for generic SSE event renderers - ToolRenderer — base for per-tool renderers (incl. fallback) - RendererRegistry — dispatches events/tools to registered renderers - CommandExecutionInterceptor — protocol for specialised bash rendering Tests: 33 new (315 passed, 0 failed, 0 errors)
pruiz
left a comment
There was a problem hiding this comment.
Constructive review of Phase A1 extraction. Overall it looks good and aligned with the plan; I left a few inline comments on small things worth checking before continuing with A2.
- Create rendering/base.py with shared sink/rich/plain properties - EventRenderer and ToolRenderer now inherit from BaseRenderer - Remove 4x/6x duplicated import/__all__ blocks from 4 files - Keep kimi improvements: str|PathLike in cache, end param in sink, truthy_env import, _int_env/_bool_env helpers
1. Move command_interceptors/ → tools/interceptors/ (per plan) 2. Propagate OSError from open_*_transcript to callers for diagnostic text preservation 3. Use real codecome.config import in monkeypatch tests instead of separately loaded module Includes kimi's Phase A2 refinements: - cache.py: str|PathLike support via os.fspath() - sink.py: end param on write_text, PlainSink flush, TextualRichLogSink TypeError fallback - tools/base.py: inherit from BaseRenderer, use sink.write_text() - Expanded sink/registry/snapshot_cache tests
pruiz
left a comment
There was a problem hiding this comment.
Follow-up review after Phase A2. The architecture looks aligned with the plan; I left a few small inline comments focused on cleanup/contract clarity before moving to renderer migrations.
1. tests/test_rendering_sinks.py: remove 4x duplicated sink test classes (293 lines → 105 lines) 2. tools/rendering/registry.py: require explicit event_types/tool_names on registered renderers; empty tuple no longer acts as catch-all. Fallback renderers are held separately and unaffected. 3. tests/test_rendering_snapshot_cache.py: remove 5x duplicated assert
Extract the simplest renderers from run-agent.py into rendering/tools/: - rendering/tools/todo.py — TodoRenderer (todowrite tool) - rendering/tools/task.py — TaskRenderer (task tool) - rendering/tools/skill.py — SkillRenderer (skill tool) - rendering/tools/permissions.py — PermissionErrorRenderer Each inherits from ToolRenderer/BaseRenderer, uses self.rich/self.plain from the sink abstraction, and delegates to _render_rich/_render_plain. Dispatch in run-agent.py now routes todo/task/skill through the new classes via a lazy RenderContext. Old render_* functions kept for now. Tests: 17 new (334 passed, 0 failed, 0 errors)
Extract three renderers from run-agent.py into rendering/tools/:
- rendering/utils.py — shared helpers (path, lexer, diff,
read framing, internal suppression)
- rendering/tools/read.py — ReadRenderer (read tool)
- rendering/tools/write.py — WriteRenderer (write tool)
- rendering/tools/edit.py — EditRenderer (edit tool)
Each uses self.context.cache (SnapshotCache) for diff baselines
and self.context.settings (RenderSettings) for display tunables.
Uses self.sink.write_text(end='') for plain diff lines.
Dispatch in run-agent.py now routes read/write/edit through the
new classes. Old render_* functions kept for backward compat.
334 passed, 0 failed, 0 errors
Extract the most complex tool renderer from run-agent.py:
- rendering/tools/apply_patch.py — ApplyPatchRenderer
Handles three patch formats: *** envelope, {patches: [...]} JSON list,
and raw unified diff. Uses _ParsedFilePatch dataclass internally.
Cache invalidation via context.cache.reread() on success.
Dispatch wired. 334 passed, 0 failed, 0 errors.
Extract three more renderers from run-agent.py: - rendering/tools/glob.py — GlobRenderer (file listing) - rendering/tools/grep.py — GrepRenderer (match-highlighted) - rendering/tools/command.py — CommandRenderer (generic bash, interceptors deferred) All tool dispatch in _dispatch_tool_renderer now uses the new classes. Sandbox-bootstrap and bash-shim interceptors still run in-process from the dispatch (will be extracted as CommandExecutionInterceptors in batch 6). 334 passed, 0 failed, 0 errors
Create CommandExecutionInterceptor implementations from the two sub-renderers previously in run-agent.py's bash dispatch: - rendering/tools/interceptors/sandbox_bootstrap.py SandboxBootstrapInterceptor — structured panels for sandbox list/inspect/detect/status/apply/regenerate/validate commands - rendering/tools/interceptors/rtk_read.py RtkReadInterceptor — routes rtk read/cat/head/tail through ReadRenderer - rendering/tools/interceptors/rtk_grep.py RtkGrepInterceptor — routes rtk grep/rg through GrepRenderer - rendering/tools/interceptors/shell_listing.py ShellListingInterceptor — routes ls/find/tree through GlobRenderer CommandRenderer now has a lazy interceptor chain that runs before the generic bash fallback. Dispatch in run-agent.py simplified to a single call. 334 passed, 0 failed, 0 errors
…grep/command and interceptors
pruiz
left a comment
There was a problem hiding this comment.
Follow-up review for Phase A3. The renderer migration looks generally solid and CI is green. I left a few comments on behavior-affecting transition details to fix before moving too far into A4.
1. Sticky sink: key RenderContext cache by sink mode (rich/plain) 2. CLI tunables: eagerly build RenderContext and patch in --read-display-lines / --write-content-lines overrides 3. Dual cache: invalidate new SnapshotCache alongside legacy cache 4. ReadRenderer: use settings.read_highlight_limit instead of hardcoded 200*1024 5. WriteRenderer: gate cache.set on status==completed + !is_error 6. EditRenderer: gate cache.reread on status==completed + !is_error 7. Interceptors: use command parameter directly instead of re-extracting from state['input']['command'] 390 passed, 0 failed, 0 errors
pruiz
left a comment
There was a problem hiding this comment.
Follow-up after the latest A3 fixes: the previous behavior-affecting comments look addressed and CI is green. I think A1–A3 are in good shape. Remaining items before moving further are mostly housekeeping: update the PR body, decide whether the broad codecome.__init__ re-exports are an accepted tradeoff during the migration, and optionally add a small regression test proving CLI render tunables are propagated into RenderSettings. No blockers from my side for the implemented A1–A3 scope.
| color_mode = resolve_color_mode(args.color) | ||
| console = build_console(color_mode) | ||
|
|
||
| # Eagerly build the rendering context so CLI tunable overrides | ||
| # (--read-display-lines, --write-content-lines, etc.) are baked | ||
| # into RenderSettings before any renderer uses them. | ||
| _rendering_ctx = _get_rendering_ctx(console) | ||
| import dataclasses as _dc | ||
| _overrides: dict[str, Any] = {} | ||
| if args.read_display_lines is not None: | ||
| _overrides["read_display_lines"] = args.read_display_lines | ||
| if args.write_content_lines is not None: | ||
| _overrides["write_content_lines"] = args.write_content_lines | ||
| if args.write_diff_limit is not None: | ||
| _overrides["write_diff_limit"] = args.write_diff_limit | ||
| if args.edit_diff_lines is not None: | ||
| _overrides["edit_diff_lines"] = args.edit_diff_lines | ||
| if _overrides: | ||
| _rendering_ctx.settings = _dc.replace(_rendering_ctx.settings, **_overrides) |
There was a problem hiding this comment.
The CLI → RenderSettings bridge now looks fixed. Optional but useful: add a small regression test around this path so flags such as --read-display-lines, --write-content-lines, --write-diff-limit, and --edit-diff-lines are proven to reach the new renderer settings. This was a silent behavior-change risk, so a focused test would help keep it from coming back.
Extract all SSE event renderers from run-agent.py into rendering/events.py: - StepStartRenderer, TextEventRenderer, ReasoningEventRenderer - ToolUseEventRenderer (delegates to FallbackToolRenderer) - StepFinishRenderer, ErrorEventRenderer, SessionStatusRenderer - ServerConnectedRenderer, ServerHeartbeatRenderer - SessionDiffRenderer, MessageUpdatedRenderer - SubagentStatusRenderer (with dedup state) - UnknownEventRenderer (fallback) render_event() dispatcher now routes through the new classes via the rendering context. Old render_* functions kept for backward compat. 390 passed, 0 failed, 0 errors
…solation - Add comprehensive tests for all 13 generic event renderers (tests/test_rendering_events.py: 58 tests) - Cache renderer instances in _get_rendering_ctx() to avoid per-event allocations in render_event() - Cache FallbackToolRenderer in ToolUseEventRenderer.__init__ - Add _reset_subagent_state() helper for test isolation 448 passed, 0 failed
pruiz
left a comment
There was a problem hiding this comment.
Review of the latest Phase A3 batch. CI is green and the generic event renderer migration is mostly aligned with the plan, but I found one behavior-affecting issue: tool_use now appears to route to the generic fallback instead of the migrated tool renderers. I also left a smaller fallback/import safety note and a typo note.
1. Fix ToolUseEventRenderer to dispatch to specific tool renderers (Read/Write/Edit/Command/...) instead of always using FallbackToolRenderer. Falls through to fallback only when the specific renderer returns False. 2. Fix missing closing paren in SubagentStatusRenderer plain heartbeat message. 3. Add regression test proving dataclasses.replace() correctly applies CLI tunable overrides to RenderSettings. 449 passed, 0 failed, 0 errors
1. tools/rendering/utils.py: rename _ROUT_* constants to _ROOT_*
- _ROUT_WORKSPACE_DOCS -> _ROOT_WORKSPACE_DOCS
- _ROUT_WORKSPACE_CONFIGS -> _ROOT_WORKSPACE_CONFIGS
- Fixes naming inconsistency with run-agent.py's _ROOT_WORKSPACE_DOCS
2. tools/run-agent.py: cache StepStartRenderer in ctx._renderers
- Added 'step_start' to pre-cached renderers dict (line 91)
- Updated render_event() to reuse cached instance with updated
phase/label instead of allocating a new instance per event
- All 61 rendering tests pass
Move _consume_events() and _run_single_attempt() from run-agent.py to codecome/runner.py. The runner functions accept render_event_fn and emit_fatal_error_fn as explicit callable parameters rather than importing from codecome.cli or depending on module-level globals. run-agent.py imports the runner functions lazily (inside main()'s loop) to avoid import-time coupling with dynamic-module tests. Tests updated to monkeypatch codecome.runner instead of the run-agent.py module for these two functions. 144 passed, 0 failed, 0 errors (test_run_agent suite)
Document the tools/ directory layout and 10 architecture rules: - Historical scripts are thin wrappers - codecome/config.py is configuration only (no execution) - Event loops under tools/events/ - Renderers under tools/rendering/ - Three sink destinations, one render path - SnapshotCache for diff state - CommandExecutionInterceptor for bash rendering - Finding helpers under tools/findings/ - Dependency direction (no circular imports) - Testing standards 355 passed, 0 failed, 0 errors
- Remove legacy globals and CLI-flag-to-global assignments from cli.py (already handled through RenderSettings/dataclasses.replace) - Reset subagent dedup state before retries in cli.py - Import finish-reason constants from canonical location in cli_render.py - Add documentation comments for modelID vs id in session.py - Refactor ToolUseEventRenderer from if/elif chain to dict + lazy cache - Fix operator precedence bug in TextualRichLogSink.write_text(): 'text + end if end else text' was parsed as 'text + (end if end else text)', doubling the text when end=""
| │ │ ├── sandbox.py # sandbox rendering helpers/interceptor support | ||
| │ │ ├── task.py # TaskRenderer | ||
| │ │ └── skill.py # SkillRenderer | ||
| │ └── command_interceptors/ |
There was a problem hiding this comment.
We should include this into this refactor.
| ├── script-to-asciinema.py # unchanged | ||
| ├── mock-llm-server.py # unchanged | ||
| ├── mock-llm-parity.py # unchanged | ||
| └── mock_llm_scripts/ # unchanged |
There was a problem hiding this comment.
We should rename mock_llm_scripts to mock-llm-scripts
| # launching a real TUI. | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| def _chat_render_and_log(self, console, phase, label, event): |
There was a problem hiding this comment.
Writting to transcript log is not a concern just of chat app, it is something both chat + phases should do, so actual writing to transcript/log, should be done earlier / upper in the stack/chain from some place common to phases & chat.
| + ", ".join("--" + n.replace("_", "-") for n in missing) | ||
| ) | ||
|
|
||
| check_opencode_version() |
There was a problem hiding this comment.
opencode version dependency is a corncern applying to both phases and chat, so this check should be performed earlier or at an upper layer.
| console = build_console(color_mode) | ||
|
|
||
| # Resolve prompt | ||
| ROOT = Path(__file__).resolve().parents[2] |
There was a problem hiding this comment.
ROOT looks like an app global thing, that should be resolved / set earlier in the process, or at an upper layer, common to both chat & phase runs.
| and os.environ.get("TERM") != "dumb" | ||
| ) | ||
|
|
||
| if _COLOR_ENABLED: |
There was a problem hiding this comment.
Why not using _color.py? what is the benefit of this duplication here?
| @@ -0,0 +1,254 @@ | |||
| # Copyright (C) 2025-2026 Pablo Ruiz García <pablo.ruiz@gmail.com> | |||
There was a problem hiding this comment.
naming is a bit confusing, why 'graceful'? this sounds more la gating.. aint it? (not sure)
| @@ -0,0 +1,53 @@ | |||
| # Copyright (C) 2025-2026 Pablo Ruiz García <pablo.ruiz@gmail.com> | |||
There was a problem hiding this comment.
Why not model this as a class that can be instantiated, used, and passed around, etc. instead of exporting methods and repeat globals like ROOT (which is somethig the class should receive on construction)?
| from events.phase_loop import PhaseEventLoop, RunResult | ||
|
|
||
| return any_step_finish_seen, step_finish_count, last_finish_reason, last_finish_tokens | ||
| __all__ = ["PhaseEventLoop", "RunResult"] |
There was a problem hiding this comment.
I dont see ChatEventLoop here.. is that expected?
| # EventRenderer base | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| class EventRenderer(BaseRenderer): |
There was a problem hiding this comment.
Aside from base EventRenderer, I think all EventRenderer classes in this file should be split into it's own files inside tools/events/rendering/* or tools/events/rendering/events/*, so they are easier to modify in the future.
- Rename mock_llm_scripts/ to mock-llm-scripts/ (data-only dir) - Remove redundant check_opencode_version() from chat/harness.py - Centralize ROOT in codecome/config.py; remove duplicates from cli_render.py, runner.py, graceful.py, version.py, chat/harness.py - Replace inline color helpers in config.py with _colors import - Add ChatEventLoop to events/__init__.py exports - Update plan: command_interceptors → command/interceptors structure - Update plan: note run-agent.py + codecome.py unification deferred - Add A8 plan document (.project/tools-refactor-a8-plan.md)
…pletion - Add RuntimeConfig dataclass and resolve_runtime_config() to config.py; both cli.py and chat/harness.py now call it instead of duplicating three separate resolve calls - Add --log-level CLI arg (env: OPENCODE_LOG_LEVEL, default: WARN); both phase and chat paths pass it to ServerRunner.start() - Move codecome/graceful.py → phases/completion.py with new phases/ package; update imports in cli.py - Create tools/phases/__init__.py
- Rewrite codecome/transcript.py as Transcript class with for_phase(), for_chat(), null() factory methods, write_event(), and close() - Remove old free functions (open_phase_transcript, open_chat_transcript, close_transcript) — no backward-compat wrappers - Update runner.py to use Transcript.for_phase() and transcript.write_event() - Update chat/harness.py to use Transcript.for_chat() - Update chat/app.py: replace inline json.dumps write with transcript.write_event(), rename transcript_fp param to transcript
…/ package Split the 484-line monolithic events.py into individual modules: - events/base.py: EventRenderer, finish constants, subagent state - events/step_start.py: StepStartRenderer - events/text.py: TextEventRenderer - events/reasoning.py: ReasoningEventRenderer - events/tool_use.py: ToolUseEventRenderer - events/step_finish.py: StepFinishRenderer - events/error.py: ErrorEventRenderer - events/session_status.py: SessionStatusRenderer - events/server.py: ServerConnected + ServerHeartbeat renderers - events/session_diff.py: SessionDiffRenderer - events/message.py: MessageUpdatedRenderer - events/subagent.py: SubagentStatusRenderer - events/unknown.py: UnknownEventRenderer events/__init__.py re-exports all symbols for backward compatibility.
…ucture - Create rendering/dispatch.py with HAVE_RICH, _get_rendering_ctx(), and render_event() — the composition root for rendering - Slim codecome/cli_render.py to CLI-only concerns (build_console, _emit_fatal_error) plus re-exports from rendering.dispatch - Restructure rendering/tools/command.py → rendering/tools/command/__init__.py - Move rendering/tools/interceptors/ → rendering/tools/command/interceptors/ - Update all import paths from rendering.tools.interceptors.* to rendering.tools.command.interceptors.*
- Create codecome/harness.py with run_phase_mode() — the phase-mode retry/resume loop, server lifecycle, and completion reporting - Slim cli.py to ~80 lines: argument parsing + dispatch to run_phase_mode() or _run_chat_mode() - cli.py now mirrors the clean dispatch pattern: parse → version check → show-model | chat | phase
…ion test - Fix test_chat_app.py: transcript_fp → transcript, .write → .write_event - Fix test_codecome_runner.py: use Transcript class, monkeypatch Transcript.for_phase instead of open_phase_transcript - Fix test_command_interceptors.py and test_rendering_tools.py: update import paths from rendering.tools.interceptors → rendering.tools.command.interceptors - Add test_render_settings_propagation.py: verify CLI flags (--read-display-lines, etc.) propagate into RenderSettings via dataclasses.replace, matching the harness pattern
- Update directory layout: add phases/, harness.py, dispatch.py, rendering/events/ package, command/interceptors/ restructure - Update renderer rule: events/ is now a package, one file per family - Update interceptor rule: interceptors live under command/ - Update dependency direction: add phases/ package
tools/ Refactor — Epic A Implementation
Branch:
wip/tools-refactorPlan:
.project/tools-refactor-plan.mdStatus: A1–A8 complete, all tests passing (357)
What this PR does
Decomposes the former ~5,800-line
tools/run-agent.pymonolith into a cleanpackage structure with explicit boundaries, while preserving all existing
behavior (phase runs, chat mode, auto-retry/resume, frontmatter repair).
Phases implemented
codecome/config.py,codecome/session.py,codecome/version.pyrendering/base.py,rendering/context.py,rendering/settings.py,rendering/cache.py,rendering/sink.pyrendering/tools/*.py,rendering/tools/command/interceptors/*.pychat/app.py,chat/harness.py,chat/debug.pycodecome/cli.py,codecome/runner.py—run-agent.pyis now 12 linesevents/base.py,events/phase_loop.py,events/chat_loop.pytools/AGENTS.mdPhase A8 highlights (latest batch)
Addresses all unresolved PR review comments:
mock_llm_scripts/→mock-llm-scripts/ROOTincodecome/config.py; remove 6 duplicate definitionsRuntimeConfig+resolve_runtime_config()--log-levelCLI arg (env:OPENCODE_LOG_LEVEL)Transcript.for_phase(),.for_chat(),.write_event(),.close()rendering/events.py(484 lines) into 13-filerendering/events/packagerender_event,_get_rendering_ctx) torendering/dispatch.pyrendering/tools/command/interceptors/cli.pyintocodecome/harness.pycodecome/graceful.py→phases/completion.pycheck_opencode_version()from chat harnessconfig.pywith_colorsimportChatEventLooptoevents/__init__.pyexportstools/AGENTS.mdfor new architectureArchitecture (after A8)
Testing
py_compilechecks passingDeferred items
run-agent.py+codecome.pyinto single CLI → Phase 2 (Epic B)