Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions .opencode/agents/chat.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# CodeCome Chat Agent

You are the CodeCome Chat Agent, an interactive assistant for the CodeCome vulnerability research workflow.

Your role is to help the user interactively: answer questions about the target, the findings, the project status, and assist with any CodeCome task the user requests.

**You must NEVER modify `codecome.yml`, `AGENTS.md`, Makefile, or any other project orchestration or configuration file unless explicitly instructed by the user.**

## Lazy loading principle

**This is an interactive chat session. Speed matters.**

Do NOT read large batches of files upfront. Instead:

1. **Read on demand.** Only read a file when the user asks about it, or when you need its content to answer a question or perform a task.
2. **Start light.** On startup, read only what the initial prompt tells you to (typically `codecome.yml` and a directory listing of `itemdb/findings/`). Do NOT read `AGENTS.md`, reconnaissance notes, skills, templates, or source code unless the user asks or a specific task requires them.
3. **Announce what you're reading.** When you do read a file, briefly mention it so the user knows what's happening (e.g., "Reading `itemdb/notes/target-profile.md`...").
4. **Cache mentally.** Once you've read a file in this session, don't re-read it unless the user says it changed.

## What you know (without reading files)

You are aware of the following CodeCome structure from your training:

### Workspace layout

- `codecome.yml` — project configuration and audit settings.
- `src/` — target source code to audit.
- `sandbox/` — sandboxed execution and validation environment.
- `itemdb/` — file-based finding database, notes, reports, and evidence.
- `itemdb/notes/` — reconnaissance notes and target model.
- `itemdb/findings/PENDING/` — candidate findings requiring validation.
- `itemdb/findings/CONFIRMED/` — validated findings with evidence.
- `itemdb/findings/EXPLOITED/` — confirmed findings with demonstrated impact.
- `itemdb/findings/REJECTED/` — disproven or non-actionable findings.
- `itemdb/findings/DUPLICATE/` — duplicate findings.
- `itemdb/evidence/` — validation evidence, grouped by finding id.
- `itemdb/reports/` — generated Markdown reports.
- `templates/` — Markdown templates for findings, reports, etc.
- `prompts/` — phase prompts used by the harness.
- `.opencode/agents/` — agent definitions (you are `chat.md`).
- `.opencode/skills/` — reusable skills for specific domains.

### Available agents

| Agent | Role |
|-------|------|
| `recon` | Target reconnaissance and attack surface mapping (Phase 1) |
| `auditor` | Vulnerability hypothesis generation (Phase 2) |
| `reviewer` | Counter-analysis of pending findings (Phase 3) |
| `validator` | Validation of individual findings (Phase 4) |
| `exploiter` | Exploit development for confirmed findings (Phase 5) |
| `reporter` | Report generation (Phase 6) |
| `chat` | Interactive assistant (this agent) |

### Available skills (load on demand only)

Skills live under `.opencode/skills/`. Do NOT read them at startup. Read a skill only when you need its guidance for a specific task.

- `source-recon/` — source tree reconnaissance patterns
- `finding-format/` — finding template and frontmatter rules
- `counter-analysis/` — counter-analysis methodology
- `sandbox-bootstrap/` — sandbox setup and configuration
- `sandbox-validation/` — validation inside sandboxes
- `exploit-development/` — exploit PoC development
- `exploit-recording/` — recording exploit sessions
- `exploit-validation/` — validating exploit impact
- `report-writing/` — report generation
- `c-cpp-security/`, `dotnet-security/`, `erlang-security/`, `php-security/`, `web-security/`, `sql-injection/`, `iac-security/`, `rabbitmq-security/` — target-specific security patterns
- `juliet-benchmark/` — Juliet test suite specifics

## Capabilities

In chat mode you can:

- **Answer questions** about the project, target, findings, evidence, or workflow.
- **Read files on demand** when the user asks about specific code, findings, or notes.
- **Create or edit findings** if the user requests (follow `templates/finding.md` format; read the finding-format skill first).
- **Run commands** in the sandbox if the user asks for validation or testing.
- **Summarize status** — list findings by status, show recon progress, etc.
- **Assist with any phase** — if the user says "do recon on file X" or "validate finding CC-0005", read the relevant agent definition and skill on demand, then proceed.

## Interaction style

- Be concise. This is a chat, not a report.
- Use short answers for simple questions.
- For complex tasks, outline what you'll do before starting.
- If a task will require reading many files, warn the user and ask if they want to proceed.
- If you're unsure what the user wants, ask for clarification.

## Safety rules

- Do not modify target source code under `src/` unless explicitly instructed.
- Do not attack third-party systems.
- Do not exfiltrate secrets.
- Experimental work goes in `sandbox/`.
- Temporary files go in `tmp/` (workspace-relative, NOT `/tmp/`).
181 changes: 181 additions & 0 deletions .project/chat-mode-plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# Chat Mode Implementation Plan

**Status:** Draft
**Date:** 2026-05-21
**Target:** `tools/run-agent.py`, `tools/events/`, `Makefile`
**Risk Level:** Medium (adds new mode to existing harness)

---

## 1. Executive Summary

Add an interactive `--chat` mode to `run-agent.py` that reuses the existing `opencode serve` infrastructure (`ServerRunner`, `EventLoop`, `SseClient`, `StateTracker`) but runs in a multi-turn loop: idle → wait for user input → send prompt → consume SSE → idle again.

The Textual TUI provides the user-facing interface: a `RichLog` upper panel (driven by the existing render pipeline) and an `Input` lower panel for typing messages.

---

## 2. Architecture

```
make chat
└─ run-agent.py --chat
├─ ServerRunner.start() # reuse tools/opencode/serve.py
├─ POST /session # reuse _create_session()
├─ ChatApp.run() # Textual TUI (new)
│ ├─ RichLog (upper panel) # receives rendered events
│ ├─ Input (lower panel) # user types messages
│ └─ QuitScreen (Ctrl+C modal) # confirm quit
└─ ChatEventLoop # new: idle→prompt→idle loop
├─ SseClient # reuse tools/events/sse_client.py
├─ StateTracker # reuse tools/events/state_tracker.py
├─ emit_event() # reuse tools/events/emitters.py
└─ POST /session/{id}/message # reuse _send_prompt_to_session()
```

### Key Design Decisions

1. **Reuse `ServerRunner`** — no need to spawn `opencode serve` manually; `ServerRunner.start()` handles health checks, ephemeral ports, and auth tokens.

2. **New `ChatEventLoop` class** — a thin wrapper around `EventLoop` that:
- Does NOT exit on session idle
- Instead, signals the TUI that the session is ready for the next prompt
- Uses `asyncio`-compatible event signaling (or `queue.Queue`) to coordinate between the SSE consumer thread and the TUI main thread

3. **Single session, multi-turn** — the session is created once. Each user message is sent as a new `POST /session/{id}/message` with a single text part.

4. **Rendering stays the same** — all events flow through the existing `render_event()` → `render_text()` / `render_tool_use()` / etc. pipeline. The `TextualConsoleProxy` bridges Rich `Console.print()` to `RichLog.write()`.

---

## 3. New Files / Changes

### 3.1 `tools/events/chat_loop.py` (NEW)

```python
class ChatEventLoop:
"""Multi-turn event loop: idle → signal ready → send prompt → consume → idle."""

def __init__(self, base_url, session_id, console, auth_token=None, workspace_dir=None):
...

def start_consumer(self, render_fn):
"""Start SSE consumer in background thread. Signals via queue."""
...

def send_prompt(self, text, agent=None, model=None, variant=None):
"""POST /session/{id}/message with text part."""
...

def stop(self):
"""Signal consumer thread to exit."""
...
```

**Test strategy:**
- Unit test with `FakeSseClient` (same pattern as `test_new_serve_stack.py`)
- Test prompt→idle→prompt→idle cycle
- Test stop() cleanly terminates consumer
- Test error handling (bad prompt, server down)

### 3.2 `tools/run-agent.py` (MODIFY)

Changes:
1. **argparse**: Add `--chat` flag, `--prompt` arg (for initial greeting)
2. **Validation**: When `--chat`, `--phase` is not required
3. **Chat path**: After server start + session creation, launch `ChatApp` instead of the phase loop
4. **Textual TUI**: `ChatApp`, `QuitScreen`, `TextualConsoleProxy` classes (conditionally imported)

### 3.3 `Makefile` (MODIFY)

```makefile
CHAT ?= 0
ifeq ($(CHAT),1)
WRAPPER_ARGS += --chat
endif

chat: venv-check
@$(PYTHON) tools/run-agent.py --chat --label "Interactive Chat" --agent $(or $(AGENT),auditor) --prompt "Please introduce yourself and wait for my instructions."
```

Also add `$(WRAPPER_ARGS)` to all phase targets (phases 1-6).

### 3.4 `requirements.txt` (MODIFY)

```
textual>=0.80.0
```

### 3.5 `tests/test_chat_mode.py` (NEW)

Tests:
1. `TestChatEventLoop` — unit tests with fake SSE client
2. `TestChatArgparse` — `--chat` flag parsing, validation rules
3. `TestTextualConsoleProxy` — Rich → RichLog bridging
4. `TestChatMainEntry` — integration test with mocked server (monkeypatch `ServerRunner`, `_create_session`, `ChatEventLoop`)

---

## 4. Test Plan

### 4.1 Unit Tests (fast, no opencode binary)

| Test | What | How |
|------|------|-----|
| `test_chat_event_loop_single_turn` | One prompt → SSE events → idle → ready | `FakeSseClient` yields canned events |
| `test_chat_event_loop_multi_turn` | Prompt → idle → prompt → idle → stop | Two canned event sequences |
| `test_chat_event_loop_stop_during_busy` | Stop signal while processing | `queue.Queue` + thread sync |
| `test_chat_event_loop_permission_rejected` | Permission auto-reject in chat mode | `FakeSseClient` with `permission.asked` |
| `test_chat_event_loop_error_recovery` | SSE disconnect → reconnect → continue | `FakeSseClient` with reconnect |
| `test_chat_argparse_requires_label_and_agent` | Missing required args | `parser.parse_args()` |
| `test_chat_argparse_chat_skips_phase` | `--chat` without `--phase` | `parser.parse_args()` |
| `test_textual_console_proxy_single_arg` | Proxy forwards single renderable | Mock `RichLog.write` |
| `test_textual_console_proxy_no_args` | Proxy writes empty line | Mock `RichLog.write` |
| `test_textual_console_proxy_multi_args` | Proxy wraps in `Group` | Mock `RichLog.write` |

### 4.2 Integration Tests (requires opencode binary, marked `@pytest.mark.component`)

| Test | What | How |
|------|------|-----|
| `test_chat_main_starts_server` | `main()` with `--chat` starts server | Monkeypatch `ChatApp` to capture args |
| `test_chat_main_missing_textual` | `--chat` without textual → error | Monkeypatch import to fail |

### 4.3 Parity Tests (using mock-llm-server.py)

Future: extend `mock-llm-parity.py` to test chat mode parity with a multi-turn script.

---

## 5. Implementation Order

1. ✅ Write this plan
2. Add `--chat` flag + argparse changes to `run-agent.py`
3. Implement `ChatEventLoop` in `tools/events/chat_loop.py`
4. Write unit tests for `ChatEventLoop`
5. Implement `TextualConsoleProxy` + `ChatApp` + `QuitScreen` in `run-agent.py`
6. Wire chat path in `main()`
7. Add `chat:` target + `WRAPPER_ARGS` to `Makefile`
8. Add `textual` to `requirements.txt`
9. Write integration tests
10. Run full test suite (`make tests`)
11. Rebase on master

---

## 6. Obsolete Artifacts

The following are no longer needed and should be removed:

- `.project/chat-bridge-plan.md` — proposed a plugin bridge approach; superseded by direct serve usage
- `test_tui.py` — standalone prototype; superseded by integrated `ChatApp`

---

## 7. Risks & Mitigations

| Risk | Mitigation |
|------|-----------|
| Textual TUI blocks SSE consumer | SSE runs in daemon thread; TUI uses `call_from_thread()` |
| Server outlives TUI quit | `ServerRunner.stop()` called in cleanup; signal handler forwards SIGTERM |
| Race: prompt sent before session ready | `ChatEventLoop` uses a ready queue; prompt blocks until consumer signals idle |
| Textual not installed | Early `ImportError` check with helpful message |
Loading
Loading