From 076ed431d504689b80d20426b130af285bc57177 Mon Sep 17 00:00:00 2001 From: Georges Garnier Date: Mon, 27 Apr 2026 13:17:47 +0200 Subject: [PATCH 01/11] =?UTF-8?q?feat(p4):=20native=20tools=20for=20agents?= =?UTF-8?q?=20=E2=80=94=20bash=20+=20file-write=20inside=20the=20sandbox?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Agents can now call two native tools by emitting fenced forge:* blocks in their reply. The runtime parses the first block per turn, executes it, feeds the structured result back as a user message, and loops up to maxTurns (capped at 10). Tools live in tools-core/src/runtime/ and stay distinct from the host-side FileWrite : they are sandboxed to /workspace, overwrite by default (in-sandbox iteration), and surface stdout/stderr/exit/timed-out to the LLM. DockerLaunch now bind-mounts a per-run host directory at /workspace so the agent has somewhere writable, and so artifacts survive the container (used later in P5 for extraction). Runtime mode switches automatically based on AGENT.md.maxTurns : single turn keeps the P3 one-shot path, multi-turn enables the tool loop and prepends a TOOLS section to the system prompt explaining the protocol. Tool output is wrapped in [forge:tool] / [/forge:tool] markers on stdout so the host TUI can route it to its action card instead of mixing it with prose. Tests : - agent-side parser (none / tool / invalid / first-block-only) - runtime FileWrite path traversal, sandbox escape, overwrite - runtime Bash stdout/stderr/exit/timeout, cwd respected Tests use a FORGE_WORKSPACE override so they don't try to touch /workspace on the host. --- packages/runtime/src/index.ts | 146 +++++++++++++++--- packages/runtime/src/tool-protocol.ts | 135 ++++++++++++++++ packages/runtime/tests/tool-protocol.test.ts | 74 +++++++++ packages/tools-core/src/docker-launch.ts | 18 ++- packages/tools-core/src/index.ts | 17 ++ packages/tools-core/src/runtime/bash.ts | 104 +++++++++++++ packages/tools-core/src/runtime/file-write.ts | 76 +++++++++ .../tools-core/tests/runtime-bash.test.ts | 52 +++++++ .../tests/runtime-file-write.test.ts | 99 ++++++++++++ 9 files changed, 696 insertions(+), 25 deletions(-) create mode 100644 packages/runtime/src/tool-protocol.ts create mode 100644 packages/runtime/tests/tool-protocol.test.ts create mode 100644 packages/tools-core/src/runtime/bash.ts create mode 100644 packages/tools-core/src/runtime/file-write.ts create mode 100644 packages/tools-core/tests/runtime-bash.test.ts create mode 100644 packages/tools-core/tests/runtime-file-write.test.ts diff --git a/packages/runtime/src/index.ts b/packages/runtime/src/index.ts index 19d2732..d8b3ab1 100644 --- a/packages/runtime/src/index.ts +++ b/packages/runtime/src/index.ts @@ -4,19 +4,33 @@ // // 1. Standalone (P1) : reads a prompt from stdin, calls an OpenAI- // compatible LLM endpoint, streams the answer to stdout. No agent -// configuration required. +// configuration required, no tool loop. // -// 2. Agent mode (P3.4) : if an AGENT.md is mounted at /agent/AGENT.md, -// its frontmatter overrides the model and its body becomes the -// system prompt. The prompt from stdin is the user message. +// 2. Agent mode (P3+) : reads /agent/AGENT.md (frontmatter overrides +// the model, body becomes the system prompt). The user prompt comes +// from stdin. Native tools are available via fenced forge:* blocks +// (P4) — the runtime parses them, executes the tool, feeds the +// result back into the conversation, and loops up to maxTurns. // -// The output is STREAMED token by token to stdout so the host can render -// progress live in the TUI. +// Output is STREAMED token by token to stdout so the host can render +// progress live in the TUI. Tool results are also written to stdout +// inside [forge:tool] markers so the host can show them in Mission +// Control without re-running the parser. import { readFileSync } from 'node:fs' import { createOpenAI } from '@ai-sdk/openai' import { parseAgentMd } from '@agent-forge/core/types' -import { streamText } from 'ai' +import { + executeBash, + executeRuntimeFileWrite, +} from '@agent-forge/tools-core' +import { type CoreMessage, streamText } from 'ai' +import { + parseFirstToolBlock, + renderBashResult, + renderInvalid, + renderWriteResult, +} from './tool-protocol.ts' const AGENT_MD_PATH = '/agent/AGENT.md' @@ -25,16 +39,18 @@ const API_KEY = process.env.FORGE_API_KEY ?? 'not-needed' const ENV_MODEL = process.env.FORGE_MODEL ?? 'mlx-community/Qwen2.5-7B-Instruct-4bit' const MAX_TOKENS = Number(process.env.FORGE_MAX_TOKENS ?? '1024') +// Hard cap to prevent runaway loops even if AGENT.md says otherwise. +const MAX_TURNS_HARD_CAP = 10 type AgentConfig = { model: string systemPrompt?: string agentName?: string + maxTurns: number } function loadAgentConfig(): AgentConfig { - // Default config when no AGENT.md is mounted (standalone P1 mode). - let config: AgentConfig = { model: ENV_MODEL } + let config: AgentConfig = { model: ENV_MODEL, maxTurns: 1 } try { const raw = readFileSync(AGENT_MD_PATH, 'utf8') const parsed = parseAgentMd(raw) @@ -42,11 +58,9 @@ function loadAgentConfig(): AgentConfig { model: parsed.meta.model ?? ENV_MODEL, systemPrompt: parsed.body.length > 0 ? parsed.body : undefined, agentName: parsed.meta.name, + maxTurns: Math.min(parsed.meta.maxTurns ?? 1, MAX_TURNS_HARD_CAP), } } catch (err) { - // ENOENT means standalone mode, that is fine. Anything else is fatal : - // a malformed AGENT.md would otherwise silently fall back to the - // default model + no system prompt, which is misleading. const code = (err as NodeJS.ErrnoException).code if (code !== 'ENOENT') { console.error( @@ -68,28 +82,112 @@ async function readStdin(): Promise { return Buffer.concat(chunks).toString('utf8').trim() } +const TOOL_PROMPT = ` + +You have access to two native tools, callable by emitting a fenced block in your reply. + +## forge:bash — execute a shell command + +\`\`\`forge:bash +{ "command": "ls -la", "timeoutMs": 10000 } +\`\`\` + +The command runs via \`bash -lc\` inside /workspace. \`timeoutMs\` is optional (default 30000, max 120000). The result (stdout, stderr, exit code) will be injected back into the conversation on the next turn. + +## forge:write — write a file in /workspace + +\`\`\`forge:write +{ "path": "src/index.ts", "content": "export const x = 1\\n" } +\`\`\` + +\`path\` is relative to /workspace (or an absolute path under /workspace). Existing files are overwritten. The result (absolute path, bytes written, or an error) will be injected back into the conversation on the next turn. + +## Iteration + +- Emit at most ONE block per reply. Anything you write before the block is shown to the user. Anything after the block is discarded. +- After you receive a tool result, decide whether you need another tool call or whether you can produce the final answer. +- When you are done, reply with plain text (no fenced block). +` + +function buildSystem(config: AgentConfig, hasTools: boolean): string | undefined { + const base = config.systemPrompt ?? '' + if (!hasTools) return base.length > 0 ? base : undefined + return base.length > 0 ? `${base}${TOOL_PROMPT}` : TOOL_PROMPT.trim() +} + +async function streamOneTurn( + provider: ReturnType, + model: string, + system: string | undefined, + messages: CoreMessage[], +): Promise { + const result = streamText({ + model: provider(model), + system, + messages, + maxTokens: MAX_TOKENS, + }) + let acc = '' + for await (const chunk of result.textStream) { + process.stdout.write(chunk) + acc += chunk + } + return acc +} + +async function executeToolBlock( + parsed: Extract, { kind: 'tool' }>, +): Promise { + const tool = parsed.tool + if (tool.kind === 'bash') { + const result = await executeBash(tool.input) + return renderBashResult(tool.input, result) + } + // tool.kind === 'write' + const result = executeRuntimeFileWrite(tool.input) + return renderWriteResult(tool.input, result) +} + async function main(): Promise { const config = loadAgentConfig() - const prompt = await readStdin() - if (!prompt) { + const userPrompt = await readStdin() + if (!userPrompt) { console.error('✗ no prompt received on stdin') process.exit(1) } const provider = createOpenAI({ baseURL: BASE_URL, apiKey: API_KEY }) + const hasTools = config.maxTurns > 1 + const system = buildSystem(config, hasTools) - const result = streamText({ - model: provider(config.model), - system: config.systemPrompt, - prompt, - maxTokens: MAX_TOKENS, - }) + const messages: CoreMessage[] = [{ role: 'user', content: userPrompt }] - for await (const chunk of result.textStream) { - process.stdout.write(chunk) + for (let turn = 0; turn < config.maxTurns; turn += 1) { + const reply = await streamOneTurn(provider, config.model, system, messages) + process.stdout.write('\n') + + if (!hasTools) break + + const parsed = parseFirstToolBlock(reply) + if (parsed.kind === 'none') break + + // Record what the LLM just said (text + raw block) so the next turn + // sees it as a real assistant message. + messages.push({ role: 'assistant', content: reply }) + + let toolReply: string + if (parsed.kind === 'invalid') { + toolReply = renderInvalid(parsed.error) + } else { + toolReply = await executeToolBlock(parsed) + } + + // Mark tool output for the host TUI so it can render it inside the + // Mission Control card instead of mixing it with prose. + process.stdout.write(`\n[forge:tool]\n${toolReply}\n[/forge:tool]\n`) + + messages.push({ role: 'user', content: toolReply }) } - // Trailing newline so the host can detect the end of the stream cleanly. - process.stdout.write('\n') } main().catch((err) => { diff --git a/packages/runtime/src/tool-protocol.ts b/packages/runtime/src/tool-protocol.ts new file mode 100644 index 0000000..5942b47 --- /dev/null +++ b/packages/runtime/src/tool-protocol.ts @@ -0,0 +1,135 @@ +// Agent-side tool protocol — fenced blocks the agent emits to invoke a +// native tool, and the rendering of tool results back to the LLM. +// +// We deliberately mirror the builder's text-structured protocol (forge:write +// and forge:run) instead of using OpenAI tool_calls for two reasons : +// 1. Local LLMs (MLX, llama.cpp) often don't honor tool_calls. +// 2. A consistent protocol across builder and agents simplifies debugging +// and lets users read the raw stream. +// +// Block grammar : +// +// ```forge:bash +// { "command": "ls -la" } +// ``` +// +// ```forge:write +// { "path": "src/index.ts", "content": "..." } +// ``` +// +// Only ONE block is parsed per turn (the first encountered). Everything +// before the block is treated as the agent's "thinking out loud" text and +// streamed to the host. Everything after the block is dropped — the agent +// will see the tool result on the next turn and continue from there. + +import { z } from 'zod' +import { + BashInputSchema, + RuntimeFileWriteInputSchema, + type BashInput, + type BashResult, + type RuntimeFileWriteInput, + type RuntimeFileWriteResult, +} from '@agent-forge/tools-core' + +export type ParsedTool = + | { kind: 'bash'; input: BashInput; raw: string } + | { kind: 'write'; input: RuntimeFileWriteInput; raw: string } + +export type ParseOutcome = + | { kind: 'none'; text: string } + | { kind: 'invalid'; text: string; error: string; raw: string } + | { kind: 'tool'; text: string; tool: ParsedTool } + +const FENCE_RE = /```forge:(bash|write)\s*\n([\s\S]*?)```/ + +export function parseFirstToolBlock(stream: string): ParseOutcome { + const m = FENCE_RE.exec(stream) + if (!m) return { kind: 'none', text: stream } + + const tag = m[1] as 'bash' | 'write' + const body = m[2] ?? '' + const before = stream.slice(0, m.index) + + let parsed: unknown + try { + parsed = JSON.parse(body) + } catch (err) { + return { + kind: 'invalid', + text: before, + error: `forge:${tag} block is not valid JSON : ${ + err instanceof Error ? err.message : String(err) + }`, + raw: m[0], + } + } + + if (tag === 'bash') { + const result = BashInputSchema.safeParse(parsed) + if (!result.success) { + return { + kind: 'invalid', + text: before, + error: `forge:bash input failed validation : ${formatZodError(result.error)}`, + raw: m[0], + } + } + return { + kind: 'tool', + text: before, + tool: { kind: 'bash', input: result.data, raw: m[0] }, + } + } + + // tag === 'write' + const result = RuntimeFileWriteInputSchema.safeParse(parsed) + if (!result.success) { + return { + kind: 'invalid', + text: before, + error: `forge:write input failed validation : ${formatZodError(result.error)}`, + raw: m[0], + } + } + return { + kind: 'tool', + text: before, + tool: { kind: 'write', input: result.data, raw: m[0] }, + } +} + +function formatZodError(err: z.ZodError): string { + return err.errors + .map((e) => `${e.path.join('.') || '(root)'}: ${e.message}`) + .join(' ; ') +} + +// Render a tool result as the message we feed back to the LLM on the next +// turn. We use the same fenced format so the agent can recognize it as +// "the result of MY previous call". +export function renderBashResult( + input: BashInput, + result: BashResult, +): string { + const head = `[forge:bash result] command="${input.command}" exit=${result.exitCode.toString()}${ + result.timedOut ? ' (timed out)' : '' + }` + const stdout = result.stdout.length > 0 ? `\n--- stdout ---\n${result.stdout}` : '' + const stderr = result.stderr.length > 0 ? `\n--- stderr ---\n${result.stderr}` : '' + return `${head}${stdout}${stderr}` +} + +export function renderWriteResult( + input: RuntimeFileWriteInput, + result: RuntimeFileWriteResult, +): string { + if (result.ok) { + return `[forge:write result] wrote ${result.absolutePath} (${result.bytes.toString()} bytes)` + } + return `[forge:write result] FAILED on path="${input.path}" : ${result.error}` +} + +export function renderInvalid(error: string): string { + return `[forge:tool error] ${error}\n\nFix the JSON or schema and try again.` +} diff --git a/packages/runtime/tests/tool-protocol.test.ts b/packages/runtime/tests/tool-protocol.test.ts new file mode 100644 index 0000000..e699fd7 --- /dev/null +++ b/packages/runtime/tests/tool-protocol.test.ts @@ -0,0 +1,74 @@ +// Tests for the agent-side tool block parser. Pure : no FS, no spawn. + +import { describe, expect, test } from 'bun:test' +import { parseFirstToolBlock } from '../src/tool-protocol.ts' + +describe('parseFirstToolBlock', () => { + test('returns kind=none on plain text', () => { + const r = parseFirstToolBlock('just a sentence with no block') + expect(r.kind).toBe('none') + }) + + test('parses a forge:bash block with prose before it', () => { + const stream = [ + 'I will list the workspace contents.', + '', + '```forge:bash', + '{ "command": "ls -la" }', + '```', + '', + 'After the block — should be ignored.', + ].join('\n') + const r = parseFirstToolBlock(stream) + expect(r.kind).toBe('tool') + if (r.kind === 'tool') { + expect(r.text.startsWith('I will list')).toBe(true) + expect(r.tool.kind).toBe('bash') + if (r.tool.kind === 'bash') expect(r.tool.input.command).toBe('ls -la') + } + }) + + test('parses a forge:write block', () => { + const stream = [ + '```forge:write', + '{ "path": "notes.md", "content": "# hi\\n" }', + '```', + ].join('\n') + const r = parseFirstToolBlock(stream) + expect(r.kind).toBe('tool') + if (r.kind === 'tool' && r.tool.kind === 'write') { + expect(r.tool.input.path).toBe('notes.md') + expect(r.tool.input.content).toBe('# hi\n') + } + }) + + test('returns kind=invalid when JSON is malformed', () => { + const stream = '```forge:bash\n{ not json }\n```' + const r = parseFirstToolBlock(stream) + expect(r.kind).toBe('invalid') + if (r.kind === 'invalid') expect(r.error).toContain('not valid JSON') + }) + + test('returns kind=invalid when schema is wrong', () => { + const stream = '```forge:bash\n{ "command": "" }\n```' + const r = parseFirstToolBlock(stream) + expect(r.kind).toBe('invalid') + if (r.kind === 'invalid') expect(r.error).toContain('failed validation') + }) + + test('only the first block matters', () => { + const stream = [ + '```forge:bash', + '{ "command": "echo a" }', + '```', + '```forge:bash', + '{ "command": "echo b" }', + '```', + ].join('\n') + const r = parseFirstToolBlock(stream) + expect(r.kind).toBe('tool') + if (r.kind === 'tool' && r.tool.kind === 'bash') { + expect(r.tool.input.command).toBe('echo a') + } + }) +}) diff --git a/packages/tools-core/src/docker-launch.ts b/packages/tools-core/src/docker-launch.ts index c7f5594..5544e00 100644 --- a/packages/tools-core/src/docker-launch.ts +++ b/packages/tools-core/src/docker-launch.ts @@ -9,7 +9,7 @@ // agents can run in parallel without collision. import { spawn, spawnSync } from 'node:child_process' -import { existsSync } from 'node:fs' +import { existsSync, mkdirSync } from 'node:fs' import { join } from 'node:path' import { z } from 'zod' import { FORGE_HOME } from './file-write.ts' @@ -75,6 +75,16 @@ export function launchAgent(input: DockerLaunchInput): LaunchHandle { spawnSync('docker', ['rm', '-f', containerName], { stdio: 'ignore' }) } + // Per-run workspace on the host, bind-mounted RW into the container so + // tools (forge:bash, forge:write) have a sandbox they can scribble in. + // Kept after the container exits — useful for debugging and for P5 + // artifact extraction. + const workspaceHostDir = join( + FORGE_HOME, + 'workspaces', + containerName, + ) + async function* run(): AsyncGenerator { if (!existsSync(agentMdPath)) { yield { type: 'error', error: `AGENT.md not found : ${agentMdPath}` } @@ -90,6 +100,8 @@ export function launchAgent(input: DockerLaunchInput): LaunchHandle { return } + mkdirSync(workspaceHostDir, { recursive: true }) + const args = [ 'run', '--rm', @@ -100,6 +112,10 @@ export function launchAgent(input: DockerLaunchInput): LaunchHandle { `${agentMdPath}:/agent/AGENT.md:ro`, '-v', `${RUNTIME_DIST_FROM_TOOLS}:/runtime:ro`, + '-v', + `${workspaceHostDir}:/workspace`, + '-w', + '/workspace', ...inheritEnv(), IMAGE, 'node', diff --git a/packages/tools-core/src/index.ts b/packages/tools-core/src/index.ts index 0fb06f6..ac20361 100644 --- a/packages/tools-core/src/index.ts +++ b/packages/tools-core/src/index.ts @@ -21,3 +21,20 @@ export { type DockerLaunchInput, type LaunchHandle, } from './docker-launch.ts' + +// Runtime-side tools — used INSIDE the agent's container, sandboxed to +// /workspace. Distinct from the host-side FileWrite above. +export { + BashInputSchema, + WORKSPACE_DIR, + executeBash, + type BashInput, + type BashResult, +} from './runtime/bash.ts' + +export { + RuntimeFileWriteInputSchema, + executeRuntimeFileWrite, + type RuntimeFileWriteInput, + type RuntimeFileWriteResult, +} from './runtime/file-write.ts' diff --git a/packages/tools-core/src/runtime/bash.ts b/packages/tools-core/src/runtime/bash.ts new file mode 100644 index 0000000..c039d42 --- /dev/null +++ b/packages/tools-core/src/runtime/bash.ts @@ -0,0 +1,104 @@ +// Bash — execute a shell command inside an agent's container. +// +// Runs INSIDE the container (called from @agent-forge/runtime). Wraps the +// command with `bash -lc` so simple shell features (pipes, &&, $VAR) just +// work. The cwd is locked to /workspace : the agent never sees anything +// outside its sandbox. A timeout (default 30s) prevents runaway commands +// from blocking the tool loop. +// +// Returns a structured result (stdout, stderr, exitCode, timedOut). The +// caller is responsible for formatting it back into a message the LLM will +// read on the next turn. + +import { spawn } from 'node:child_process' +import { z } from 'zod' + +export const WORKSPACE_DIR = '/workspace' + +// Tests on the host don't have /workspace. The runtime always uses +// WORKSPACE_DIR when running inside the container ; tests can point this +// at a temp dir via FORGE_WORKSPACE. +function bashCwd(): string { + return process.env.FORGE_WORKSPACE ?? WORKSPACE_DIR +} + +export const BashInputSchema = z.object({ + command: z + .string() + .min(1) + .describe( + 'Shell command to execute inside the agent sandbox. Run via `bash -lc`. The current directory is /workspace.', + ), + timeoutMs: z + .number() + .int() + .positive() + .max(120_000) + .optional() + .describe('Hard timeout in milliseconds. Defaults to 30000. Capped at 120000.'), +}) + +export type BashInput = z.infer + +export type BashResult = { + stdout: string + stderr: string + exitCode: number + timedOut: boolean +} + +const DEFAULT_TIMEOUT_MS = 30_000 +// Cap captured streams so a runaway command can't blow the LLM context. +const MAX_OUTPUT_BYTES = 16_384 + +function clip(text: string): string { + if (Buffer.byteLength(text, 'utf8') <= MAX_OUTPUT_BYTES) return text + const head = text.slice(0, MAX_OUTPUT_BYTES) + return `${head}\n…[output truncated at ${MAX_OUTPUT_BYTES.toString()} bytes]` +} + +export async function executeBash(input: BashInput): Promise { + const timeoutMs = input.timeoutMs ?? DEFAULT_TIMEOUT_MS + return await new Promise((resolve) => { + const child = spawn('bash', ['-lc', input.command], { + cwd: bashCwd(), + stdio: ['ignore', 'pipe', 'pipe'], + }) + + let stdout = '' + let stderr = '' + let timedOut = false + + const timer = setTimeout(() => { + timedOut = true + child.kill('SIGKILL') + }, timeoutMs) + + child.stdout.on('data', (b: Buffer) => { + stdout += b.toString('utf8') + }) + child.stderr.on('data', (b: Buffer) => { + stderr += b.toString('utf8') + }) + + child.on('error', (err) => { + clearTimeout(timer) + resolve({ + stdout: clip(stdout), + stderr: clip(`${stderr}${err.message}`), + exitCode: -1, + timedOut, + }) + }) + + child.on('close', (code) => { + clearTimeout(timer) + resolve({ + stdout: clip(stdout), + stderr: clip(stderr), + exitCode: code ?? -1, + timedOut, + }) + }) + }) +} diff --git a/packages/tools-core/src/runtime/file-write.ts b/packages/tools-core/src/runtime/file-write.ts new file mode 100644 index 0000000..f568eee --- /dev/null +++ b/packages/tools-core/src/runtime/file-write.ts @@ -0,0 +1,76 @@ +// FileWrite (runtime) — write a file under /workspace from inside the +// agent's container. +// +// Distinct from packages/tools-core/src/file-write.ts which writes under +// the host's ~/.agent-forge/. The runtime version is sandboxed to +// /workspace : the agent has no way to escape its container's mount. +// +// Path traversal (..), null bytes, and absolute paths outside /workspace +// are refused. Existing files are overwritten by default — unlike the +// host tool which is strict — because in-sandbox iteration is expected +// (agents often rewrite their own files mid-loop). +// +// The sandbox root defaults to /workspace (the in-container mount) but +// can be overridden via FORGE_WORKSPACE — useful for tests that want to +// run on the host without touching /workspace. + +import { mkdirSync, writeFileSync } from 'node:fs' +import { dirname, isAbsolute, join, resolve } from 'node:path' +import { z } from 'zod' +import { WORKSPACE_DIR } from './bash.ts' + +function sandboxRoot(): string { + return process.env.FORGE_WORKSPACE ?? WORKSPACE_DIR +} + +export const RuntimeFileWriteInputSchema = z.object({ + path: z + .string() + .min(1) + .describe( + 'Path inside the agent sandbox (/workspace). Either relative ("notes.md") or absolute under /workspace ("/workspace/src/index.ts"). Paths outside /workspace are rejected.', + ), + content: z.string().describe('Full file content to write.'), +}) + +export type RuntimeFileWriteInput = z.infer + +export type RuntimeFileWriteResult = + | { ok: true; absolutePath: string; bytes: number } + | { ok: false; error: string } + +export function resolveSandboxedPath(rawPath: string): + | { ok: true; absolutePath: string } + | { ok: false; error: string } { + if (rawPath.includes('\0')) { + return { ok: false, error: 'path contains a null byte' } + } + const root = sandboxRoot() + const target = isAbsolute(rawPath) ? rawPath : join(root, rawPath) + const resolved = resolve(target) + if (resolved !== root && !resolved.startsWith(`${root}/`)) { + return { + ok: false, + error: `path escapes the agent sandbox (${root})`, + } + } + return { ok: true, absolutePath: resolved } +} + +export function executeRuntimeFileWrite( + input: RuntimeFileWriteInput, +): RuntimeFileWriteResult { + const safe = resolveSandboxedPath(input.path) + if (!safe.ok) return safe + try { + mkdirSync(dirname(safe.absolutePath), { recursive: true }) + writeFileSync(safe.absolutePath, input.content, 'utf8') + return { + ok: true, + absolutePath: safe.absolutePath, + bytes: Buffer.byteLength(input.content, 'utf8'), + } + } catch (err) { + return { ok: false, error: err instanceof Error ? err.message : String(err) } + } +} diff --git a/packages/tools-core/tests/runtime-bash.test.ts b/packages/tools-core/tests/runtime-bash.test.ts new file mode 100644 index 0000000..b0b0d10 --- /dev/null +++ b/packages/tools-core/tests/runtime-bash.test.ts @@ -0,0 +1,52 @@ +// Round-trip tests for the runtime-side Bash tool. +// Uses FORGE_WORKSPACE so the cwd is a temp dir, not /workspace. + +import { afterAll, beforeAll, describe, expect, test } from 'bun:test' +import { mkdtempSync, rmSync, writeFileSync } from 'node:fs' +import { tmpdir } from 'node:os' +import { join } from 'node:path' + +let TMP_WORKSPACE: string +const ORIGINAL_ENV = process.env.FORGE_WORKSPACE + +beforeAll(() => { + TMP_WORKSPACE = mkdtempSync(join(tmpdir(), 'forge-rt-bash-')) + process.env.FORGE_WORKSPACE = TMP_WORKSPACE +}) + +afterAll(() => { + if (ORIGINAL_ENV === undefined) delete process.env.FORGE_WORKSPACE + else process.env.FORGE_WORKSPACE = ORIGINAL_ENV + rmSync(TMP_WORKSPACE, { recursive: true, force: true }) +}) + +const { executeBash } = await import('../src/runtime/bash.ts') + +describe('executeBash', () => { + test('captures stdout from a simple command', async () => { + const r = await executeBash({ command: 'echo hello' }) + expect(r.exitCode).toBe(0) + expect(r.stdout.trim()).toBe('hello') + expect(r.stderr).toBe('') + expect(r.timedOut).toBe(false) + }) + + test('captures stderr and a non-zero exit code', async () => { + const r = await executeBash({ command: 'echo oops 1>&2 ; exit 7' }) + expect(r.exitCode).toBe(7) + expect(r.stderr.trim()).toBe('oops') + }) + + test('runs in the sandbox cwd', async () => { + writeFileSync(join(TMP_WORKSPACE, 'marker.txt'), 'present') + const r = await executeBash({ command: 'cat marker.txt' }) + expect(r.exitCode).toBe(0) + expect(r.stdout).toBe('present') + }) + + test('honors a tight timeout', async () => { + const r = await executeBash({ command: 'sleep 5', timeoutMs: 200 }) + expect(r.timedOut).toBe(true) + expect(r.exitCode).not.toBe(0) + }) +}) diff --git a/packages/tools-core/tests/runtime-file-write.test.ts b/packages/tools-core/tests/runtime-file-write.test.ts new file mode 100644 index 0000000..8087566 --- /dev/null +++ b/packages/tools-core/tests/runtime-file-write.test.ts @@ -0,0 +1,99 @@ +// Security and round-trip tests for the runtime-side FileWrite tool. +// Uses FORGE_WORKSPACE to point the sandbox at a temp dir so the tests +// don't try to write to /workspace on the host. + +import { afterAll, afterEach, beforeAll, describe, expect, test } from 'bun:test' +import { existsSync, mkdtempSync, readFileSync, rmSync } from 'node:fs' +import { tmpdir } from 'node:os' +import { join } from 'node:path' + +let TMP_WORKSPACE: string +const ORIGINAL_ENV = process.env.FORGE_WORKSPACE + +beforeAll(() => { + TMP_WORKSPACE = mkdtempSync(join(tmpdir(), 'forge-rt-fw-')) + process.env.FORGE_WORKSPACE = TMP_WORKSPACE +}) + +afterAll(() => { + if (ORIGINAL_ENV === undefined) delete process.env.FORGE_WORKSPACE + else process.env.FORGE_WORKSPACE = ORIGINAL_ENV + rmSync(TMP_WORKSPACE, { recursive: true, force: true }) +}) + +// Late import so module-level reads of process.env happen after we set it. +const { + executeRuntimeFileWrite, + resolveSandboxedPath, +} = await import('../src/runtime/file-write.ts') + +afterEach(() => { + // Wipe contents but keep the dir itself so the env var stays valid. + for (const entry of [ + 'a.txt', + 'sub/b.txt', + 'sub', + 'overwrite-me.txt', + ]) { + const p = join(TMP_WORKSPACE, entry) + if (existsSync(p)) rmSync(p, { recursive: true, force: true }) + } +}) + +describe('resolveSandboxedPath (runtime)', () => { + test('accepts a relative path under the sandbox', () => { + const r = resolveSandboxedPath('a.txt') + expect(r.ok).toBe(true) + if (r.ok) expect(r.absolutePath).toBe(join(TMP_WORKSPACE, 'a.txt')) + }) + + test('rejects path traversal', () => { + const r = resolveSandboxedPath('../escape.txt') + expect(r.ok).toBe(false) + }) + + test('rejects absolute path outside the sandbox', () => { + const r = resolveSandboxedPath('/etc/passwd') + expect(r.ok).toBe(false) + }) + + test('rejects null byte', () => { + const r = resolveSandboxedPath('foo\0bar') + expect(r.ok).toBe(false) + }) +}) + +describe('executeRuntimeFileWrite', () => { + test('writes a file in the sandbox', () => { + const r = executeRuntimeFileWrite({ path: 'a.txt', content: 'hi' }) + expect(r.ok).toBe(true) + if (r.ok) { + expect(readFileSync(r.absolutePath, 'utf8')).toBe('hi') + expect(r.bytes).toBe(2) + } + }) + + test('creates parent directories', () => { + const r = executeRuntimeFileWrite({ + path: 'sub/b.txt', + content: 'nested', + }) + expect(r.ok).toBe(true) + if (r.ok) expect(readFileSync(r.absolutePath, 'utf8')).toBe('nested') + }) + + test('overwrites an existing file', () => { + executeRuntimeFileWrite({ path: 'overwrite-me.txt', content: 'v1' }) + const r = executeRuntimeFileWrite({ path: 'overwrite-me.txt', content: 'v2' }) + expect(r.ok).toBe(true) + if (r.ok) expect(readFileSync(r.absolutePath, 'utf8')).toBe('v2') + }) + + test('refuses path escaping the sandbox', () => { + const r = executeRuntimeFileWrite({ + path: '../evil.txt', + content: 'x', + }) + expect(r.ok).toBe(false) + }) +}) From cc720f060924bd1422267fc3522c872b826cff7f Mon Sep 17 00:00:00 2001 From: Georges Garnier Date: Mon, 27 Apr 2026 14:02:10 +0200 Subject: [PATCH 02/11] =?UTF-8?q?feat(p4):=20add=20four=20remaining=20nati?= =?UTF-8?q?ve=20tools=20=E2=80=94=20read,=20edit,=20grep,=20glob?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Completes the P4 tool catalog. Agents now have the full six : bash, write, read, edit, grep, glob. All sandboxed to /workspace, all callable via fenced forge:* blocks, all validated by Zod and capped on output size to protect the LLM context. read — line-based offset/limit, 16 KB clip, fails on missing or non-regular files edit — exact substring patch, refuses ambiguous matches unless replaceAll=true, refuses identical old/new grep — pure JS regex over a glob filter, skips binary files (NUL bytes), 200 hits cap, line clipped at 400 chars glob — hand-rolled matcher for *, **, ? (no dep), 200 results cap, walk bounded at 5000 nodes Tool dispatcher in the runtime is now a switch over six branches. System prompt lists all six with their JSON shape. Tests added for each tool plus four new parser cases (forge:read / edit / grep / glob, and a refine-rule violation on edit). resolveSandboxedPath is now exported so tools that don't write but still need the sandbox root (grep, glob) reuse it instead of duplicating the FORGE_WORKSPACE override logic. --- packages/runtime/src/index.ts | 81 +++++++-- packages/runtime/src/tool-protocol.ts | 154 ++++++++++++++---- packages/runtime/tests/tool-protocol.test.ts | 51 ++++++ packages/tools-core/src/index.ts | 30 ++++ packages/tools-core/src/runtime/file-edit.ts | 89 ++++++++++ packages/tools-core/src/runtime/file-read.ts | 93 +++++++++++ packages/tools-core/src/runtime/glob.ts | 123 ++++++++++++++ packages/tools-core/src/runtime/grep.ts | 106 ++++++++++++ .../tests/runtime-file-edit.test.ts | 86 ++++++++++ .../tests/runtime-file-read.test.ts | 55 +++++++ .../tools-core/tests/runtime-glob.test.ts | 53 ++++++ .../tools-core/tests/runtime-grep.test.ts | 62 +++++++ 12 files changed, 938 insertions(+), 45 deletions(-) create mode 100644 packages/tools-core/src/runtime/file-edit.ts create mode 100644 packages/tools-core/src/runtime/file-read.ts create mode 100644 packages/tools-core/src/runtime/glob.ts create mode 100644 packages/tools-core/src/runtime/grep.ts create mode 100644 packages/tools-core/tests/runtime-file-edit.test.ts create mode 100644 packages/tools-core/tests/runtime-file-read.test.ts create mode 100644 packages/tools-core/tests/runtime-glob.test.ts create mode 100644 packages/tools-core/tests/runtime-grep.test.ts diff --git a/packages/runtime/src/index.ts b/packages/runtime/src/index.ts index d8b3ab1..a473609 100644 --- a/packages/runtime/src/index.ts +++ b/packages/runtime/src/index.ts @@ -22,13 +22,21 @@ import { createOpenAI } from '@ai-sdk/openai' import { parseAgentMd } from '@agent-forge/core/types' import { executeBash, + executeRuntimeFileEdit, + executeRuntimeFileRead, executeRuntimeFileWrite, + executeRuntimeGlob, + executeRuntimeGrep, } from '@agent-forge/tools-core' import { type CoreMessage, streamText } from 'ai' import { parseFirstToolBlock, renderBashResult, + renderEditResult, + renderGlobResult, + renderGrepResult, renderInvalid, + renderReadResult, renderWriteResult, } from './tool-protocol.ts' @@ -84,7 +92,7 @@ async function readStdin(): Promise { const TOOL_PROMPT = ` -You have access to two native tools, callable by emitting a fenced block in your reply. +You have access to six native tools, each callable by emitting a fenced block in your reply. ## forge:bash — execute a shell command @@ -92,19 +100,51 @@ You have access to two native tools, callable by emitting a fenced block in your { "command": "ls -la", "timeoutMs": 10000 } \`\`\` -The command runs via \`bash -lc\` inside /workspace. \`timeoutMs\` is optional (default 30000, max 120000). The result (stdout, stderr, exit code) will be injected back into the conversation on the next turn. +Runs via \`bash -lc\` inside /workspace. \`timeoutMs\` defaults to 30000, capped at 120000. -## forge:write — write a file in /workspace +## forge:write — create or overwrite a file \`\`\`forge:write { "path": "src/index.ts", "content": "export const x = 1\\n" } \`\`\` -\`path\` is relative to /workspace (or an absolute path under /workspace). Existing files are overwritten. The result (absolute path, bytes written, or an error) will be injected back into the conversation on the next turn. +\`path\` is relative to /workspace (or absolute under /workspace). Existing files are overwritten. + +## forge:read — read a file + +\`\`\`forge:read +{ "path": "src/index.ts", "offset": 0, "limit": 200 } +\`\`\` + +\`offset\` and \`limit\` are line-based, both optional. Default limit 200, max 2000. Output is clipped at 16 KB ; use offset/limit to walk a long file. + +## forge:edit — patch a file by exact substring replacement + +\`\`\`forge:edit +{ "path": "src/index.ts", "oldString": "const x = 1", "newString": "const x = 2" } +\`\`\` + +\`oldString\` must match exactly once unless you set \`replaceAll\` true. If it matches multiple times, widen the surrounding context until it's unique. + +## forge:grep — regex search across files + +\`\`\`forge:grep +{ "pattern": "TODO|FIXME", "glob": "src/**/*.ts", "ignoreCase": false } +\`\`\` + +\`pattern\` is a JavaScript RegExp source. \`glob\` is optional (defaults to \`**/*\`). Returns up to 200 hits with path:line:text. + +## forge:glob — list files by pattern + +\`\`\`forge:glob +{ "pattern": "src/**/*.ts" } +\`\`\` + +Supports \`*\`, \`**\`, and \`?\`. Returns up to 200 paths relative to /workspace. ## Iteration -- Emit at most ONE block per reply. Anything you write before the block is shown to the user. Anything after the block is discarded. +- Emit at most ONE block per reply. Text before the block is shown to the user. Text after the block is discarded. - After you receive a tool result, decide whether you need another tool call or whether you can produce the final answer. - When you are done, reply with plain text (no fenced block). ` @@ -139,13 +179,32 @@ async function executeToolBlock( parsed: Extract, { kind: 'tool' }>, ): Promise { const tool = parsed.tool - if (tool.kind === 'bash') { - const result = await executeBash(tool.input) - return renderBashResult(tool.input, result) + switch (tool.kind) { + case 'bash': { + const result = await executeBash(tool.input) + return renderBashResult(tool.input, result) + } + case 'write': { + const result = executeRuntimeFileWrite(tool.input) + return renderWriteResult(tool.input, result) + } + case 'read': { + const result = executeRuntimeFileRead(tool.input) + return renderReadResult(tool.input, result) + } + case 'edit': { + const result = executeRuntimeFileEdit(tool.input) + return renderEditResult(tool.input, result) + } + case 'grep': { + const result = executeRuntimeGrep(tool.input) + return renderGrepResult(tool.input, result) + } + case 'glob': { + const result = executeRuntimeGlob(tool.input) + return renderGlobResult(tool.input, result) + } } - // tool.kind === 'write' - const result = executeRuntimeFileWrite(tool.input) - return renderWriteResult(tool.input, result) } async function main(): Promise { diff --git a/packages/runtime/src/tool-protocol.ts b/packages/runtime/src/tool-protocol.ts index 5942b47..b9ea1a0 100644 --- a/packages/runtime/src/tool-protocol.ts +++ b/packages/runtime/src/tool-protocol.ts @@ -7,7 +7,7 @@ // 2. A consistent protocol across builder and agents simplifies debugging // and lets users read the raw stream. // -// Block grammar : +// Six tools wired today : bash, write, read, edit, grep, glob. // // ```forge:bash // { "command": "ls -la" } @@ -17,37 +17,81 @@ // { "path": "src/index.ts", "content": "..." } // ``` // +// ```forge:read +// { "path": "src/index.ts", "offset": 0, "limit": 200 } +// ``` +// +// ```forge:edit +// { "path": "src/index.ts", "oldString": "...", "newString": "..." } +// ``` +// +// ```forge:grep +// { "pattern": "TODO", "glob": "**/*.ts", "ignoreCase": true } +// ``` +// +// ```forge:glob +// { "pattern": "src/**/*.ts" } +// ``` +// // Only ONE block is parsed per turn (the first encountered). Everything -// before the block is treated as the agent's "thinking out loud" text and -// streamed to the host. Everything after the block is dropped — the agent -// will see the tool result on the next turn and continue from there. +// before the block is treated as the agent's "thinking out loud" text +// and streamed to the host. Everything after the block is dropped — the +// agent will see the tool result on the next turn and continue from there. import { z } from 'zod' import { BashInputSchema, + RuntimeFileEditInputSchema, + RuntimeFileReadInputSchema, RuntimeFileWriteInputSchema, + RuntimeGlobInputSchema, + RuntimeGrepInputSchema, type BashInput, type BashResult, + type GrepHit, + type RuntimeFileEditInput, + type RuntimeFileEditResult, + type RuntimeFileReadInput, + type RuntimeFileReadResult, type RuntimeFileWriteInput, type RuntimeFileWriteResult, + type RuntimeGlobInput, + type RuntimeGlobResult, + type RuntimeGrepInput, + type RuntimeGrepResult, } from '@agent-forge/tools-core' +export type ToolKind = 'bash' | 'write' | 'read' | 'edit' | 'grep' | 'glob' + export type ParsedTool = | { kind: 'bash'; input: BashInput; raw: string } | { kind: 'write'; input: RuntimeFileWriteInput; raw: string } + | { kind: 'read'; input: RuntimeFileReadInput; raw: string } + | { kind: 'edit'; input: RuntimeFileEditInput; raw: string } + | { kind: 'grep'; input: RuntimeGrepInput; raw: string } + | { kind: 'glob'; input: RuntimeGlobInput; raw: string } export type ParseOutcome = | { kind: 'none'; text: string } | { kind: 'invalid'; text: string; error: string; raw: string } | { kind: 'tool'; text: string; tool: ParsedTool } -const FENCE_RE = /```forge:(bash|write)\s*\n([\s\S]*?)```/ +const SCHEMAS: Record = { + bash: BashInputSchema, + write: RuntimeFileWriteInputSchema, + read: RuntimeFileReadInputSchema, + edit: RuntimeFileEditInputSchema, + grep: RuntimeGrepInputSchema, + glob: RuntimeGlobInputSchema, +} + +const FENCE_RE = /```forge:(bash|write|read|edit|grep|glob)\s*\n([\s\S]*?)```/ export function parseFirstToolBlock(stream: string): ParseOutcome { const m = FENCE_RE.exec(stream) if (!m) return { kind: 'none', text: stream } - const tag = m[1] as 'bash' | 'write' + const tag = m[1] as ToolKind const body = m[2] ?? '' const before = stream.slice(0, m.index) @@ -65,37 +109,23 @@ export function parseFirstToolBlock(stream: string): ParseOutcome { } } - if (tag === 'bash') { - const result = BashInputSchema.safeParse(parsed) - if (!result.success) { - return { - kind: 'invalid', - text: before, - error: `forge:bash input failed validation : ${formatZodError(result.error)}`, - raw: m[0], - } - } - return { - kind: 'tool', - text: before, - tool: { kind: 'bash', input: result.data, raw: m[0] }, - } - } - - // tag === 'write' - const result = RuntimeFileWriteInputSchema.safeParse(parsed) + const schema = SCHEMAS[tag] + const result = schema.safeParse(parsed) if (!result.success) { return { kind: 'invalid', text: before, - error: `forge:write input failed validation : ${formatZodError(result.error)}`, + error: `forge:${tag} input failed validation : ${formatZodError(result.error)}`, raw: m[0], } } + + // Narrow to the right ParsedTool variant by tag — the schema guarantees + // the data shape matches. return { kind: 'tool', text: before, - tool: { kind: 'write', input: result.data, raw: m[0] }, + tool: { kind: tag, input: result.data, raw: m[0] } as ParsedTool, } } @@ -105,13 +135,11 @@ function formatZodError(err: z.ZodError): string { .join(' ; ') } -// Render a tool result as the message we feed back to the LLM on the next -// turn. We use the same fenced format so the agent can recognize it as -// "the result of MY previous call". -export function renderBashResult( - input: BashInput, - result: BashResult, -): string { +// ── Result renderers : turn each tool's structured result into the +// message we feed back to the LLM on the next turn. Same `[forge:X result]` +// header so the agent recognizes it as the answer to its previous call. + +export function renderBashResult(input: BashInput, result: BashResult): string { const head = `[forge:bash result] command="${input.command}" exit=${result.exitCode.toString()}${ result.timedOut ? ' (timed out)' : '' }` @@ -130,6 +158,64 @@ export function renderWriteResult( return `[forge:write result] FAILED on path="${input.path}" : ${result.error}` } +export function renderReadResult( + input: RuntimeFileReadInput, + result: RuntimeFileReadResult, +): string { + if (!result.ok) { + return `[forge:read result] FAILED on path="${input.path}" : ${result.error}` + } + const head = `[forge:read result] ${result.absolutePath} · lines ${(input.offset ?? 0).toString()}..${( + (input.offset ?? 0) + result.returnedLines + ).toString()} of ${result.totalLines.toString()}${result.truncatedBytes ? ' (clipped)' : ''}` + return `${head}\n--- content ---\n${result.content}` +} + +export function renderEditResult( + input: RuntimeFileEditInput, + result: RuntimeFileEditResult, +): string { + if (result.ok) { + return `[forge:edit result] ${result.absolutePath} · ${result.replacements.toString()} replacement${ + result.replacements === 1 ? '' : 's' + }` + } + return `[forge:edit result] FAILED on path="${input.path}" : ${result.error}` +} + +export function renderGlobResult( + input: RuntimeGlobInput, + result: RuntimeGlobResult, +): string { + if (!result.ok) { + return `[forge:glob result] FAILED on pattern="${input.pattern}" : ${result.error}` + } + const head = `[forge:glob result] ${result.matches.length.toString()} match${ + result.matches.length === 1 ? '' : 'es' + }${result.truncated ? ' (truncated)' : ''}` + if (result.matches.length === 0) return head + return `${head}\n${result.matches.join('\n')}` +} + +export function renderGrepResult( + input: RuntimeGrepInput, + result: RuntimeGrepResult, +): string { + if (!result.ok) { + return `[forge:grep result] FAILED on pattern="${input.pattern}" : ${result.error}` + } + const head = `[forge:grep result] ${result.hits.length.toString()} hit${ + result.hits.length === 1 ? '' : 's' + } across ${result.scanned.toString()} file${result.scanned === 1 ? '' : 's'}${ + result.truncated ? ' (truncated)' : '' + }` + if (result.hits.length === 0) return head + const body = result.hits + .map((h: GrepHit) => `${h.path}:${h.line.toString()}: ${h.text}`) + .join('\n') + return `${head}\n${body}` +} + export function renderInvalid(error: string): string { return `[forge:tool error] ${error}\n\nFix the JSON or schema and try again.` } diff --git a/packages/runtime/tests/tool-protocol.test.ts b/packages/runtime/tests/tool-protocol.test.ts index e699fd7..b05cae1 100644 --- a/packages/runtime/tests/tool-protocol.test.ts +++ b/packages/runtime/tests/tool-protocol.test.ts @@ -71,4 +71,55 @@ describe('parseFirstToolBlock', () => { expect(r.tool.input.command).toBe('echo a') } }) + + test('parses forge:read', () => { + const r = parseFirstToolBlock( + '```forge:read\n{ "path": "src/x.ts", "offset": 10, "limit": 50 }\n```', + ) + expect(r.kind).toBe('tool') + if (r.kind === 'tool' && r.tool.kind === 'read') { + expect(r.tool.input.path).toBe('src/x.ts') + expect(r.tool.input.offset).toBe(10) + expect(r.tool.input.limit).toBe(50) + } + }) + + test('parses forge:edit', () => { + const r = parseFirstToolBlock( + '```forge:edit\n{ "path": "a.ts", "oldString": "x", "newString": "y" }\n```', + ) + expect(r.kind).toBe('tool') + if (r.kind === 'tool' && r.tool.kind === 'edit') { + expect(r.tool.input.oldString).toBe('x') + expect(r.tool.input.newString).toBe('y') + } + }) + + test('parses forge:grep', () => { + const r = parseFirstToolBlock( + '```forge:grep\n{ "pattern": "TODO", "glob": "**/*.ts", "ignoreCase": true }\n```', + ) + expect(r.kind).toBe('tool') + if (r.kind === 'tool' && r.tool.kind === 'grep') { + expect(r.tool.input.pattern).toBe('TODO') + expect(r.tool.input.ignoreCase).toBe(true) + } + }) + + test('parses forge:glob', () => { + const r = parseFirstToolBlock( + '```forge:glob\n{ "pattern": "src/**/*.ts" }\n```', + ) + expect(r.kind).toBe('tool') + if (r.kind === 'tool' && r.tool.kind === 'glob') { + expect(r.tool.input.pattern).toBe('src/**/*.ts') + } + }) + + test('rejects invalid forge:edit (oldString equals newString)', () => { + const r = parseFirstToolBlock( + '```forge:edit\n{ "path": "a.ts", "oldString": "x", "newString": "x" }\n```', + ) + expect(r.kind).toBe('invalid') + }) }) diff --git a/packages/tools-core/src/index.ts b/packages/tools-core/src/index.ts index ac20361..c38bac8 100644 --- a/packages/tools-core/src/index.ts +++ b/packages/tools-core/src/index.ts @@ -35,6 +35,36 @@ export { export { RuntimeFileWriteInputSchema, executeRuntimeFileWrite, + resolveSandboxedPath, type RuntimeFileWriteInput, type RuntimeFileWriteResult, } from './runtime/file-write.ts' + +export { + RuntimeFileReadInputSchema, + executeRuntimeFileRead, + type RuntimeFileReadInput, + type RuntimeFileReadResult, +} from './runtime/file-read.ts' + +export { + RuntimeFileEditInputSchema, + executeRuntimeFileEdit, + type RuntimeFileEditInput, + type RuntimeFileEditResult, +} from './runtime/file-edit.ts' + +export { + RuntimeGlobInputSchema, + executeRuntimeGlob, + type RuntimeGlobInput, + type RuntimeGlobResult, +} from './runtime/glob.ts' + +export { + RuntimeGrepInputSchema, + executeRuntimeGrep, + type GrepHit, + type RuntimeGrepInput, + type RuntimeGrepResult, +} from './runtime/grep.ts' diff --git a/packages/tools-core/src/runtime/file-edit.ts b/packages/tools-core/src/runtime/file-edit.ts new file mode 100644 index 0000000..99e7cdf --- /dev/null +++ b/packages/tools-core/src/runtime/file-edit.ts @@ -0,0 +1,89 @@ +// FileEdit (runtime) — patch a file under /workspace by replacing one +// exact substring with another. Same shape as Claude Code's Edit tool. +// +// The match must be unique unless `replaceAll: true`. This forces the +// LLM to widen its `oldString` window when it's ambiguous, instead of +// guessing which occurrence it meant. + +import { readFileSync, writeFileSync } from 'node:fs' +import { z } from 'zod' +import { resolveSandboxedPath } from './file-write.ts' + +export const RuntimeFileEditInputSchema = z + .object({ + path: z.string().min(1).describe('File path under /workspace.'), + oldString: z + .string() + .min(1) + .describe( + 'Exact substring to find. Must match exactly once unless replaceAll is true.', + ), + newString: z.string().describe('Replacement substring.'), + replaceAll: z + .boolean() + .optional() + .describe('Replace every occurrence. Default false.'), + }) + .refine((v) => v.oldString !== v.newString, { + message: 'oldString and newString must differ', + path: ['newString'], + }) + +export type RuntimeFileEditInput = z.infer + +export type RuntimeFileEditResult = + | { ok: true; absolutePath: string; replacements: number } + | { ok: false; error: string } + +function countOccurrences(haystack: string, needle: string): number { + if (needle.length === 0) return 0 + let count = 0 + let i = 0 + while (true) { + const at = haystack.indexOf(needle, i) + if (at === -1) return count + count += 1 + i = at + needle.length + } +} + +export function executeRuntimeFileEdit( + input: RuntimeFileEditInput, +): RuntimeFileEditResult { + const safe = resolveSandboxedPath(input.path) + if (!safe.ok) return safe + + let original: string + try { + original = readFileSync(safe.absolutePath, 'utf8') + } catch (err) { + return { ok: false, error: err instanceof Error ? err.message : String(err) } + } + + const occurrences = countOccurrences(original, input.oldString) + if (occurrences === 0) { + return { ok: false, error: 'oldString not found in file' } + } + if (occurrences > 1 && !input.replaceAll) { + return { + ok: false, + error: `oldString matches ${occurrences.toString()} times — widen the context or set replaceAll=true`, + } + } + + const updated = input.replaceAll + ? original.split(input.oldString).join(input.newString) + : original.replace(input.oldString, input.newString) + + try { + writeFileSync(safe.absolutePath, updated, 'utf8') + } catch (err) { + return { ok: false, error: err instanceof Error ? err.message : String(err) } + } + + return { + ok: true, + absolutePath: safe.absolutePath, + replacements: input.replaceAll ? occurrences : 1, + } +} diff --git a/packages/tools-core/src/runtime/file-read.ts b/packages/tools-core/src/runtime/file-read.ts new file mode 100644 index 0000000..7e15ba9 --- /dev/null +++ b/packages/tools-core/src/runtime/file-read.ts @@ -0,0 +1,93 @@ +// FileRead (runtime) — read a file under /workspace. +// +// Offset/limit are line-based (matches what an LLM expects when reading +// source files). Output is clipped at 16 KB to protect the LLM context ; +// any further reading should use offset. + +import { readFileSync, statSync } from 'node:fs' +import { z } from 'zod' +import { resolveSandboxedPath } from './file-write.ts' + +export const RuntimeFileReadInputSchema = z.object({ + path: z + .string() + .min(1) + .describe( + 'Path inside the agent sandbox (/workspace). Relative or absolute under /workspace.', + ), + offset: z + .number() + .int() + .min(0) + .optional() + .describe('Line offset (1-based first line of the slice). Default 0.'), + limit: z + .number() + .int() + .positive() + .max(2000) + .optional() + .describe('Max number of lines to return. Default 200, max 2000.'), +}) + +export type RuntimeFileReadInput = z.infer + +export type RuntimeFileReadResult = + | { + ok: true + absolutePath: string + content: string + totalLines: number + returnedLines: number + truncatedBytes: boolean + } + | { ok: false; error: string } + +const DEFAULT_LIMIT = 200 +const MAX_BYTES = 16_384 + +export function executeRuntimeFileRead( + input: RuntimeFileReadInput, +): RuntimeFileReadResult { + const safe = resolveSandboxedPath(input.path) + if (!safe.ok) return safe + + let raw: string + try { + const st = statSync(safe.absolutePath) + if (!st.isFile()) { + return { ok: false, error: `not a regular file : ${safe.absolutePath}` } + } + raw = readFileSync(safe.absolutePath, 'utf8') + } catch (err) { + return { ok: false, error: err instanceof Error ? err.message : String(err) } + } + + const allLines = raw.split('\n') + // Drop the trailing empty element when the file ends with \n so totalLines + // reflects the human count, not split() artifact. + if (allLines.length > 0 && allLines[allLines.length - 1] === '') { + allLines.pop() + } + const totalLines = allLines.length + + const offset = input.offset ?? 0 + const limit = input.limit ?? DEFAULT_LIMIT + const slice = allLines.slice(offset, offset + limit) + let content = slice.join('\n') + + let truncatedBytes = false + if (Buffer.byteLength(content, 'utf8') > MAX_BYTES) { + truncatedBytes = true + content = `${content.slice(0, MAX_BYTES)}\n…[output truncated at ${MAX_BYTES.toString()} bytes — use offset/limit for the rest]` + } + + return { + ok: true, + absolutePath: safe.absolutePath, + content, + totalLines, + returnedLines: slice.length, + truncatedBytes, + } +} diff --git a/packages/tools-core/src/runtime/glob.ts b/packages/tools-core/src/runtime/glob.ts new file mode 100644 index 0000000..7692037 --- /dev/null +++ b/packages/tools-core/src/runtime/glob.ts @@ -0,0 +1,123 @@ +// Glob (runtime) — find files matching a glob pattern under /workspace. +// +// Hand-rolled to avoid adding a dependency to the in-container bundle. +// Supports the patterns LLMs actually use : `*`, `**`, `?`. No braces, +// no character classes — those rarely appear in agent-emitted patterns +// and would just bloat the parser. +// +// Returns relative paths (from the sandbox root) sorted alphabetically. +// Capped at 200 results. + +import { readdirSync, statSync } from 'node:fs' +import { join, relative, resolve, sep } from 'node:path' +import { z } from 'zod' +import { resolveSandboxedPath } from './file-write.ts' + +export const RuntimeGlobInputSchema = z.object({ + pattern: z + .string() + .min(1) + .describe( + 'Glob pattern relative to /workspace. Supports *, **, and ?. Example : "src/**/*.ts".', + ), +}) + +export type RuntimeGlobInput = z.infer + +export type RuntimeGlobResult = + | { ok: true; matches: string[]; truncated: boolean } + | { ok: false; error: string } + +const MAX_MATCHES = 200 +const MAX_WALK_NODES = 5000 + +// Convert a glob to a RegExp anchored at the start, allowing partial +// path-segment matches. Each segment is converted independently and +// joined with `/`. +function globToRegex(pattern: string): RegExp { + // Normalize : split on / and process per segment. + const parts = pattern.split('/') + const out: string[] = [] + for (const part of parts) { + if (part === '**') { + out.push('(?:.*?)') + continue + } + let segment = '' + for (const ch of part) { + if (ch === '*') segment += '[^/]*' + else if (ch === '?') segment += '[^/]' + else if (/[.+^${}()|[\]\\]/.test(ch)) segment += `\\${ch}` + else segment += ch + } + out.push(segment) + } + // Glue : `/` between regular segments, but `**` already swallows separators. + let glued = '' + for (let i = 0; i < out.length; i += 1) { + const part = out[i] as string + if (i === 0) { + glued = part + continue + } + const prev = out[i - 1] + if (prev === '(?:.*?)' || part === '(?:.*?)') glued += part + else glued += `/${part}` + } + return new RegExp(`^${glued}$`) +} + +// Walk a directory tree and return relative POSIX paths of all FILES. +// Bounded by MAX_WALK_NODES to protect against pathological trees. +function walk(root: string): string[] { + const out: string[] = [] + const stack: string[] = [root] + let visited = 0 + while (stack.length > 0 && visited < MAX_WALK_NODES) { + const dir = stack.pop() as string + let entries: string[] + try { + entries = readdirSync(dir) + } catch { + continue + } + for (const name of entries) { + visited += 1 + if (visited >= MAX_WALK_NODES) break + const full = join(dir, name) + let st: ReturnType + try { + st = statSync(full) + } catch { + continue + } + if (st.isDirectory()) { + stack.push(full) + } else if (st.isFile()) { + const rel = relative(root, full).split(sep).join('/') + out.push(rel) + } + } + } + return out +} + +export function executeRuntimeGlob( + input: RuntimeGlobInput, +): RuntimeGlobResult { + // Resolve sandbox root via a dummy path : ensures we use the same + // FORGE_WORKSPACE override as the other runtime tools. + const safeRoot = resolveSandboxedPath('.') + if (!safeRoot.ok) return safeRoot + const root = resolve(safeRoot.absolutePath) + + const re = globToRegex(input.pattern) + const all = walk(root) + const matched = all.filter((p) => re.test(p)).sort() + const truncated = matched.length > MAX_MATCHES + return { + ok: true, + matches: truncated ? matched.slice(0, MAX_MATCHES) : matched, + truncated, + } +} diff --git a/packages/tools-core/src/runtime/grep.ts b/packages/tools-core/src/runtime/grep.ts new file mode 100644 index 0000000..b7ae6e8 --- /dev/null +++ b/packages/tools-core/src/runtime/grep.ts @@ -0,0 +1,106 @@ +// Grep (runtime) — regex search across files under /workspace. +// +// Pure JS, no ripgrep dependency : the alpine container doesn't ship rg +// by default and we don't want to bloat the image just for this. For a +// POC the trade-off is fine ; if it becomes a bottleneck we'll bind-mount +// rg later. +// +// The pattern is a JavaScript RegExp source. Files are filtered by an +// optional glob to keep the scan bounded. Binary-looking content +// (NUL bytes in the first 4 KB) is skipped. + +import { readFileSync } from 'node:fs' +import { join } from 'node:path' +import { z } from 'zod' +import { resolveSandboxedPath } from './file-write.ts' +import { executeRuntimeGlob } from './glob.ts' + +export const RuntimeGrepInputSchema = z.object({ + pattern: z + .string() + .min(1) + .describe('JavaScript RegExp source. Example : "TODO|FIXME".'), + glob: z + .string() + .optional() + .describe( + 'Optional file pattern relative to /workspace (e.g. "src/**/*.ts"). Defaults to "**/*".', + ), + ignoreCase: z.boolean().optional().describe('Case-insensitive match. Default false.'), +}) + +export type RuntimeGrepInput = z.infer + +export type GrepHit = { path: string; line: number; text: string } + +export type RuntimeGrepResult = + | { ok: true; hits: GrepHit[]; truncated: boolean; scanned: number } + | { ok: false; error: string } + +const MAX_HITS = 200 +const MAX_LINE_LEN = 400 // clip long lines so a minified file doesn't blow context +const MAX_FILE_BYTES = 1_048_576 // skip files > 1 MB + +function looksBinary(buf: Buffer): boolean { + const limit = Math.min(buf.length, 4096) + for (let i = 0; i < limit; i += 1) { + if (buf[i] === 0) return true + } + return false +} + +export function executeRuntimeGrep( + input: RuntimeGrepInput, +): RuntimeGrepResult { + let re: RegExp + try { + re = new RegExp(input.pattern, input.ignoreCase ? 'i' : undefined) + } catch (err) { + return { ok: false, error: `invalid regex : ${err instanceof Error ? err.message : String(err)}` } + } + + const safeRoot = resolveSandboxedPath('.') + if (!safeRoot.ok) return safeRoot + + const filesResult = executeRuntimeGlob({ pattern: input.glob ?? '**/*' }) + if (!filesResult.ok) return filesResult + + const hits: GrepHit[] = [] + let truncated = false + let scanned = 0 + + for (const rel of filesResult.matches) { + if (hits.length >= MAX_HITS) { + truncated = true + break + } + const abs = join(safeRoot.absolutePath, rel) + let buf: Buffer + try { + buf = readFileSync(abs) + } catch { + continue + } + if (buf.length > MAX_FILE_BYTES) continue + if (looksBinary(buf)) continue + scanned += 1 + const text = buf.toString('utf8') + const lines = text.split('\n') + for (let i = 0; i < lines.length; i += 1) { + const line = lines[i] as string + if (re.test(line)) { + hits.push({ + path: rel, + line: i + 1, + text: line.length > MAX_LINE_LEN ? `${line.slice(0, MAX_LINE_LEN)}…` : line, + }) + if (hits.length >= MAX_HITS) { + truncated = true + break + } + } + } + } + + return { ok: true, hits, truncated, scanned } +} diff --git a/packages/tools-core/tests/runtime-file-edit.test.ts b/packages/tools-core/tests/runtime-file-edit.test.ts new file mode 100644 index 0000000..37d0ddb --- /dev/null +++ b/packages/tools-core/tests/runtime-file-edit.test.ts @@ -0,0 +1,86 @@ +import { afterAll, beforeAll, describe, expect, test } from 'bun:test' +import { mkdtempSync, readFileSync, rmSync, writeFileSync } from 'node:fs' +import { tmpdir } from 'node:os' +import { join } from 'node:path' + +let TMP_WORKSPACE: string +const ORIGINAL_ENV = process.env.FORGE_WORKSPACE + +beforeAll(() => { + TMP_WORKSPACE = mkdtempSync(join(tmpdir(), 'forge-rt-fe-')) + process.env.FORGE_WORKSPACE = TMP_WORKSPACE +}) + +afterAll(() => { + if (ORIGINAL_ENV === undefined) delete process.env.FORGE_WORKSPACE + else process.env.FORGE_WORKSPACE = ORIGINAL_ENV + rmSync(TMP_WORKSPACE, { recursive: true, force: true }) +}) + +const { executeRuntimeFileEdit } = await import('../src/runtime/file-edit.ts') + +describe('executeRuntimeFileEdit', () => { + test('replaces a unique substring', () => { + const path = join(TMP_WORKSPACE, 'a.ts') + writeFileSync(path, 'const x = 1\nconst y = 2\n') + const r = executeRuntimeFileEdit({ + path: 'a.ts', + oldString: 'const x = 1', + newString: 'const x = 42', + }) + expect(r.ok).toBe(true) + if (r.ok) { + expect(r.replacements).toBe(1) + expect(readFileSync(path, 'utf8')).toBe('const x = 42\nconst y = 2\n') + } + }) + + test('refuses ambiguous match without replaceAll', () => { + const path = join(TMP_WORKSPACE, 'b.ts') + writeFileSync(path, 'foo\nfoo\n') + const r = executeRuntimeFileEdit({ + path: 'b.ts', + oldString: 'foo', + newString: 'bar', + }) + expect(r.ok).toBe(false) + if (!r.ok) expect(r.error).toContain('matches 2 times') + }) + + test('replaceAll handles every occurrence', () => { + const path = join(TMP_WORKSPACE, 'c.ts') + writeFileSync(path, 'foo\nfoo\nfoo\n') + const r = executeRuntimeFileEdit({ + path: 'c.ts', + oldString: 'foo', + newString: 'bar', + replaceAll: true, + }) + expect(r.ok).toBe(true) + if (r.ok) { + expect(r.replacements).toBe(3) + expect(readFileSync(path, 'utf8')).toBe('bar\nbar\nbar\n') + } + }) + + test('returns an error when oldString is missing', () => { + const path = join(TMP_WORKSPACE, 'd.ts') + writeFileSync(path, 'hello') + const r = executeRuntimeFileEdit({ + path: 'd.ts', + oldString: 'goodbye', + newString: 'bye', + }) + expect(r.ok).toBe(false) + if (!r.ok) expect(r.error).toContain('not found') + }) + + test('refuses path outside the sandbox', () => { + const r = executeRuntimeFileEdit({ + path: '../escape', + oldString: 'a', + newString: 'b', + }) + expect(r.ok).toBe(false) + }) +}) diff --git a/packages/tools-core/tests/runtime-file-read.test.ts b/packages/tools-core/tests/runtime-file-read.test.ts new file mode 100644 index 0000000..e1c6374 --- /dev/null +++ b/packages/tools-core/tests/runtime-file-read.test.ts @@ -0,0 +1,55 @@ +import { afterAll, beforeAll, describe, expect, test } from 'bun:test' +import { mkdtempSync, rmSync, writeFileSync } from 'node:fs' +import { tmpdir } from 'node:os' +import { join } from 'node:path' + +let TMP_WORKSPACE: string +const ORIGINAL_ENV = process.env.FORGE_WORKSPACE + +beforeAll(() => { + TMP_WORKSPACE = mkdtempSync(join(tmpdir(), 'forge-rt-fr-')) + process.env.FORGE_WORKSPACE = TMP_WORKSPACE +}) + +afterAll(() => { + if (ORIGINAL_ENV === undefined) delete process.env.FORGE_WORKSPACE + else process.env.FORGE_WORKSPACE = ORIGINAL_ENV + rmSync(TMP_WORKSPACE, { recursive: true, force: true }) +}) + +const { executeRuntimeFileRead } = await import('../src/runtime/file-read.ts') + +describe('executeRuntimeFileRead', () => { + test('reads the full file when no offset/limit', () => { + writeFileSync(join(TMP_WORKSPACE, 'a.txt'), 'one\ntwo\nthree\n') + const r = executeRuntimeFileRead({ path: 'a.txt' }) + expect(r.ok).toBe(true) + if (r.ok) { + expect(r.content).toBe('one\ntwo\nthree') + expect(r.totalLines).toBe(3) + expect(r.returnedLines).toBe(3) + } + }) + + test('honors offset and limit', () => { + const lines = Array.from({ length: 10 }, (_, i) => `line${(i + 1).toString()}`).join('\n') + writeFileSync(join(TMP_WORKSPACE, 'b.txt'), lines) + const r = executeRuntimeFileRead({ path: 'b.txt', offset: 3, limit: 4 }) + expect(r.ok).toBe(true) + if (r.ok) { + expect(r.content).toBe('line4\nline5\nline6\nline7') + expect(r.totalLines).toBe(10) + expect(r.returnedLines).toBe(4) + } + }) + + test('rejects path outside the sandbox', () => { + const r = executeRuntimeFileRead({ path: '../escape.txt' }) + expect(r.ok).toBe(false) + }) + + test('returns an error for missing files', () => { + const r = executeRuntimeFileRead({ path: 'nope.txt' }) + expect(r.ok).toBe(false) + }) +}) diff --git a/packages/tools-core/tests/runtime-glob.test.ts b/packages/tools-core/tests/runtime-glob.test.ts new file mode 100644 index 0000000..cdfe6ec --- /dev/null +++ b/packages/tools-core/tests/runtime-glob.test.ts @@ -0,0 +1,53 @@ +import { afterAll, beforeAll, describe, expect, test } from 'bun:test' +import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from 'node:fs' +import { tmpdir } from 'node:os' +import { join } from 'node:path' + +let TMP_WORKSPACE: string +const ORIGINAL_ENV = process.env.FORGE_WORKSPACE + +beforeAll(() => { + TMP_WORKSPACE = mkdtempSync(join(tmpdir(), 'forge-rt-gl-')) + process.env.FORGE_WORKSPACE = TMP_WORKSPACE + mkdirSync(join(TMP_WORKSPACE, 'src/sub'), { recursive: true }) + writeFileSync(join(TMP_WORKSPACE, 'src/index.ts'), '') + writeFileSync(join(TMP_WORKSPACE, 'src/sub/util.ts'), '') + writeFileSync(join(TMP_WORKSPACE, 'src/sub/util.test.ts'), '') + writeFileSync(join(TMP_WORKSPACE, 'README.md'), '') +}) + +afterAll(() => { + if (ORIGINAL_ENV === undefined) delete process.env.FORGE_WORKSPACE + else process.env.FORGE_WORKSPACE = ORIGINAL_ENV + rmSync(TMP_WORKSPACE, { recursive: true, force: true }) +}) + +const { executeRuntimeGlob } = await import('../src/runtime/glob.ts') + +describe('executeRuntimeGlob', () => { + test('matches all .ts files recursively with **/*.ts', () => { + const r = executeRuntimeGlob({ pattern: '**/*.ts' }) + expect(r.ok).toBe(true) + if (r.ok) { + expect(r.matches).toEqual(['src/index.ts', 'src/sub/util.test.ts', 'src/sub/util.ts']) + } + }) + + test('matches a single segment with src/*.ts', () => { + const r = executeRuntimeGlob({ pattern: 'src/*.ts' }) + expect(r.ok).toBe(true) + if (r.ok) expect(r.matches).toEqual(['src/index.ts']) + }) + + test('matches with ? for single char', () => { + const r = executeRuntimeGlob({ pattern: 'README.m?' }) + expect(r.ok).toBe(true) + if (r.ok) expect(r.matches).toEqual(['README.md']) + }) + + test('returns empty when nothing matches', () => { + const r = executeRuntimeGlob({ pattern: '**/*.rs' }) + expect(r.ok).toBe(true) + if (r.ok) expect(r.matches).toEqual([]) + }) +}) diff --git a/packages/tools-core/tests/runtime-grep.test.ts b/packages/tools-core/tests/runtime-grep.test.ts new file mode 100644 index 0000000..4e711ba --- /dev/null +++ b/packages/tools-core/tests/runtime-grep.test.ts @@ -0,0 +1,62 @@ +import { afterAll, beforeAll, describe, expect, test } from 'bun:test' +import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from 'node:fs' +import { tmpdir } from 'node:os' +import { join } from 'node:path' + +let TMP_WORKSPACE: string +const ORIGINAL_ENV = process.env.FORGE_WORKSPACE + +beforeAll(() => { + TMP_WORKSPACE = mkdtempSync(join(tmpdir(), 'forge-rt-gr-')) + process.env.FORGE_WORKSPACE = TMP_WORKSPACE + mkdirSync(join(TMP_WORKSPACE, 'src'), { recursive: true }) + writeFileSync( + join(TMP_WORKSPACE, 'src/index.ts'), + '// TODO: implement\nexport const x = 1\n// fixme later\n', + ) + writeFileSync(join(TMP_WORKSPACE, 'src/util.ts'), 'export const todo = "x"\n') + writeFileSync(join(TMP_WORKSPACE, 'README.md'), '# project\nTODO: write docs\n') +}) + +afterAll(() => { + if (ORIGINAL_ENV === undefined) delete process.env.FORGE_WORKSPACE + else process.env.FORGE_WORKSPACE = ORIGINAL_ENV + rmSync(TMP_WORKSPACE, { recursive: true, force: true }) +}) + +const { executeRuntimeGrep } = await import('../src/runtime/grep.ts') + +describe('executeRuntimeGrep', () => { + test('finds case-sensitive matches across files', () => { + const r = executeRuntimeGrep({ pattern: 'TODO' }) + expect(r.ok).toBe(true) + if (r.ok) { + const paths = r.hits.map((h) => h.path).sort() + expect(paths).toEqual(['README.md', 'src/index.ts']) + } + }) + + test('honors ignoreCase', () => { + const r = executeRuntimeGrep({ pattern: 'todo', ignoreCase: true }) + expect(r.ok).toBe(true) + if (r.ok) { + const paths = r.hits.map((h) => h.path).sort() + // util.ts matches via "const todo", index.ts via TODO, README.md via TODO. + expect(paths).toEqual(['README.md', 'src/index.ts', 'src/util.ts']) + } + }) + + test('respects the glob filter', () => { + const r = executeRuntimeGrep({ pattern: 'TODO', glob: '**/*.md' }) + expect(r.ok).toBe(true) + if (r.ok) { + const paths = r.hits.map((h) => h.path) + expect(paths).toEqual(['README.md']) + } + }) + + test('returns an error for an invalid regex', () => { + const r = executeRuntimeGrep({ pattern: '(' }) + expect(r.ok).toBe(false) + }) +}) From b38f18492a2240c9420303fc7ec06d1dcf7891dc Mon Sep 17 00:00:00 2001 From: Georges Garnier Date: Mon, 27 Apr 2026 14:08:22 +0200 Subject: [PATCH 03/11] fix(cli,core): auto-quote AGENT.md description when it contains a colon MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mistral Small (and likely most small models) regularly emits an AGENT.md where the `description` value embeds a colon — typically when listing steps ("Step 1: ..., Step 2: ...") or quoting another key (`maxTurns: 8`, `timeout: 60s`). YAML reads that as a nested mapping and rejects the whole frontmatter. Two fixes : 1. The builder system prompt now spells out the rule in both EN and FR : no colon / no embedded YAML / wrap in double quotes if needed. Comes with an example so the LLM has a template to follow. 2. The CLI normalizer now scans the frontmatter and wraps any `description` value containing an unquoted colon in double quotes, escaping any embedded double quotes in the process. Already-quoted values are left alone. Tests cover both : an unquoted "Step 1: ... Step 2: ..." is fixed up and accepted ; an already-quoted equivalent is left untouched. --- packages/cli/src/builder-actions.ts | 48 ++++++++++++++++++++-- packages/cli/tests/builder-actions.test.ts | 42 +++++++++++++++++++ packages/core/src/builder/system-prompt.ts | 2 + 3 files changed, 88 insertions(+), 4 deletions(-) diff --git a/packages/cli/src/builder-actions.ts b/packages/cli/src/builder-actions.ts index 8858c2a..889ce4c 100644 --- a/packages/cli/src/builder-actions.ts +++ b/packages/cli/src/builder-actions.ts @@ -144,17 +144,57 @@ export type RunActionExecution = { export type ActionExecution = WriteActionExecution | RunActionExecution +function quoteUnsafeDescription(content: string): string { + // Small models commonly write a `description` value containing a colon + // (e.g. "Étape 1 : ..." or "...timeout: 60s..."), which YAML mis-parses + // as a nested mapping and chokes the whole frontmatter. Detect that case + // and wrap the value in double quotes ; the parser then reads it as a + // plain string. + const lines = content.split('\n') + let inFrontmatter = false + let fmFenceCount = 0 + for (let i = 0; i < lines.length; i += 1) { + const line = lines[i] as string + if (line.trim() === '---') { + fmFenceCount += 1 + inFrontmatter = fmFenceCount === 1 + if (fmFenceCount === 2) break + continue + } + if (!inFrontmatter) continue + const m = /^(\s*description\s*:\s*)(.*)$/.exec(line) + if (!m) continue + const prefix = m[1] as string + const value = (m[2] as string).trim() + if (value.length === 0) continue + // Already quoted ? leave it alone. + if ( + (value.startsWith('"') && value.endsWith('"')) || + (value.startsWith("'") && value.endsWith("'")) + ) { + continue + } + if (!value.includes(':')) continue + // Escape any embedded double quotes so the wrap stays valid. + const safe = value.replace(/"/g, '\\"') + lines[i] = `${prefix}"${safe}"` + } + return lines.join('\n') +} + function normalizeAgentMd(content: string): string { // Small models often confuse the protocol separator (`---` between path // and content) with the YAML frontmatter opener and forget to write a // leading `---`. If the content looks like raw frontmatter (starts with a // recognized key), prepend `---` so it parses cleanly. const trimmed = content.replace(/^\s+/, '') - if (trimmed.startsWith('---')) return content - if (/^(name|description|model|sandbox|maxTurns)\s*:/m.test(trimmed)) { - return `---\n${content.replace(/^\s+/, '')}` + let normalized = content + if (!trimmed.startsWith('---')) { + if (/^(name|description|model|sandbox|maxTurns)\s*:/m.test(trimmed)) { + normalized = `---\n${content.replace(/^\s+/, '')}` + } } - return content + return quoteUnsafeDescription(normalized) } const AGENT_PATH_RE = /^(agents\/[a-z][a-z0-9-]*)\/[^/]+$/ diff --git a/packages/cli/tests/builder-actions.test.ts b/packages/cli/tests/builder-actions.test.ts index 25c58d1..18259cf 100644 --- a/packages/cli/tests/builder-actions.test.ts +++ b/packages/cli/tests/builder-actions.test.ts @@ -198,6 +198,48 @@ body` if (exec.kind === 'write') expect(exec.result.ok).toBe(true) }) + test('quotes a description that contains an unquoted colon', () => { + const unsafe = `--- +name: ${TEST_AGENT} +description: Audits the project. Step 1: list files. Step 2: fix TODOs. +sandbox: + image: agent-forge/base:latest + timeout: 60s +maxTurns: 1 +--- + +body` + const exec = executeAction({ + kind: 'write', + path: `agents/${TEST_AGENT}/AGENT.md`, + content: unsafe, + raw: '', + }) + expect(exec.kind).toBe('write') + if (exec.kind === 'write') expect(exec.result.ok).toBe(true) + }) + + test('leaves an already-quoted description untouched', () => { + const safe = `--- +name: ${TEST_AGENT} +description: "Step 1: do this. Step 2: do that." +sandbox: + image: agent-forge/base:latest + timeout: 60s +maxTurns: 1 +--- + +body` + const exec = executeAction({ + kind: 'write', + path: `agents/${TEST_AGENT}/AGENT.md`, + content: safe, + raw: '', + }) + expect(exec.kind).toBe('write') + if (exec.kind === 'write') expect(exec.result.ok).toBe(true) + }) + test('run action passes through pre-flight (actual launch is async)', () => { const exec = executeAction({ kind: 'run', diff --git a/packages/core/src/builder/system-prompt.ts b/packages/core/src/builder/system-prompt.ts index 81ab986..8d3d869 100644 --- a/packages/core/src/builder/system-prompt.ts +++ b/packages/core/src/builder/system-prompt.ts @@ -35,6 +35,7 @@ You are a haiku poet. Answer with exactly three lines, syllables 5-7-5. ABSOLUTE rules — failing any of these IS A BUG : - The path MUST be exactly \`agents//AGENT.md\`. The filename MUST be the literal string \`AGENT.md\`. Never invent variants like \`haiku-writer.md\` or \`HAIKU-WRITER.md\`. - The file content MUST start with a YAML frontmatter block : a line \`---\`, then the YAML keys (name, description, sandbox, maxTurns), then a closing \`---\`, then the body. Look at the example above carefully — there are TWO \`---\` after the \`path:\` line : the first one separates the path from the content, the second one OPENS the frontmatter. +- The \`description\` value MUST be a single line of plain prose, with NO colon (\`:\`), NO YAML-looking syntax (\`key: value\`), NO line break, NO unbalanced quote. If you cannot write it cleanly without a colon, wrap the whole value in double quotes : \`description: "Audits the project. Step 1: list files. Step 2: fix TODOs."\`. Never repeat the values of the other keys (\`maxTurns\`, \`timeout\`) inside \`description\` — they go in the body of the AGENT.md instead. - The block opens with three backticks + \`forge:write\` and CLOSES with three backticks on their own line. - Replace placeholders with real values. Do not keep angle brackets. - Always propose the block first and ask the user to confirm with "yes" / "go" / "ok" before re-emitting it. @@ -83,6 +84,7 @@ Tu es un poète haïku. Réponds par exactement trois lignes, syllabes 5-7-5. Règles ABSOLUES — toute violation EST UN BUG : - Le chemin DOIT être exactement \`agents//AGENT.md\`. Le nom de fichier DOIT être la chaîne littérale \`AGENT.md\`. N'invente jamais de variante comme \`haiku-writer.md\` ou \`HAIKU-WRITER.md\`. - Le contenu du fichier DOIT commencer par un bloc YAML frontmatter : une ligne \`---\`, puis les clés YAML (name, description, sandbox, maxTurns), puis un \`---\` de fermeture, puis le corps. Regarde bien l'exemple ci-dessus — il y a DEUX \`---\` après la ligne \`path:\` : le premier sépare le path du contenu, le second OUVRE le frontmatter. +- La valeur de \`description\` DOIT être une seule ligne de prose simple, SANS deux-points (\`:\`), SANS syntaxe ressemblant à du YAML (\`clé: valeur\`), SANS retour à la ligne, SANS guillemet non fermé. Si tu ne peux pas écrire la valeur proprement sans deux-points, encadre toute la valeur entre guillemets doubles : \`description: "Audite le projet. Étape 1 : lister les fichiers. Étape 2 : corriger les TODO."\`. Ne répète JAMAIS les valeurs des autres clés (\`maxTurns\`, \`timeout\`) dans la \`description\` — elles vont dans le corps de l'AGENT.md. - Le bloc s'ouvre par trois backticks + \`forge:write\` et se FERME par trois backticks sur leur propre ligne. - Remplace les placeholders par des vraies valeurs. Ne laisse pas les chevrons. - Propose toujours le bloc d'abord et demande la confirmation (« oui » / « ok » / « go ») avant de le ré-émettre. From 2bbf412af90336b97a2e1da013bfa5e790e5c7a9 Mon Sep 17 00:00:00 2001 From: Georges Garnier Date: Mon, 27 Apr 2026 14:22:53 +0200 Subject: [PATCH 04/11] feat(cli): Tab to focus a Mission Control card, Enter to open it MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cards in Mission Control are now keyboard-navigable : - Tab cycle focus forward (lands on the most recent card the first time) - Shift+Tab cycle focus backward - Enter open the focused card in a full-screen detail view - Esc / q close the detail view The detail view uses the entire terminal, shows the action's full content (the AGENT.md body for write actions ; prompt + streamed output for run actions) with line numbers, and supports scrolling with arrow keys / PgUp / PgDn / g / G. Tab/Enter are only captured when there are actions, no permission dialog is up, the detail view is closed, and the prompt input is empty — so typing in the prompt always wins. The prompt draft is now lifted into useChat so App can read it for that guard. Visual cues : the focused card switches to a brighter "double" border and gains a leading triangle ; the Mission Control header changes its hint line depending on whether anything is focused. --- packages/cli/src/components/App.tsx | 44 ++++- packages/cli/src/components/CardDetail.tsx | 157 ++++++++++++++++++ .../cli/src/components/MissionControl.tsx | 63 +++++-- packages/cli/src/components/Welcome.tsx | 11 +- packages/cli/src/hooks/useCardFocus.ts | 85 ++++++++++ packages/cli/src/hooks/useChat.ts | 11 ++ 6 files changed, 348 insertions(+), 23 deletions(-) create mode 100644 packages/cli/src/components/CardDetail.tsx create mode 100644 packages/cli/src/hooks/useCardFocus.ts diff --git a/packages/cli/src/components/App.tsx b/packages/cli/src/components/App.tsx index 4687356..fbfe321 100644 --- a/packages/cli/src/components/App.tsx +++ b/packages/cli/src/components/App.tsx @@ -9,11 +9,16 @@ // └──────────────┘ ← terminal bottom (FIXED) // // PgUp / PgDn / Ctrl+E scroll the chat transcript inside Welcome. +// Tab / Shift+Tab cycle focus through Mission Control cards (only when +// the prompt input is empty so it doesn't fight TextInput). Enter on a +// focused card opens a full-screen CardDetail view ; Esc closes it. import { Box, useInput, useStdin } from 'ink' import React from 'react' import { useChatContext } from '../hooks/useChatContext.tsx' +import { useCardFocus } from '../hooks/useCardFocus.ts' import { useLanguage } from '../i18n/LanguageContext.tsx' +import { CardDetail } from './CardDetail.tsx' import { MissionControl } from './MissionControl.tsx' import { ProviderLogo } from './ProviderLogo.tsx' import { Splash } from './Splash.tsx' @@ -22,25 +27,54 @@ import { Welcome } from './Welcome.tsx' export function App(): React.JSX.Element { const { lang } = useLanguage() const { isRawModeSupported } = useStdin() - const { scrollUp, scrollDown, scrollToBottom, pending, state } = useChatContext() + const { scrollUp, scrollDown, scrollToBottom, pending, state, promptDraft } = + useChatContext() + const focus = useCardFocus(state.actions) const rows = process.stdout.rows ?? 30 const cols = process.stdout.columns ?? 80 const hasPending = pending !== null const hasActions = state.actions.length > 0 + const promptIsEmpty = promptDraft.length === 0 + + // Tab/Enter is only meaningful when there are actions, the prompt is + // empty (so TextInput doesn't lose its keystrokes), and no permission + // dialog is showing. + const cardKeysActive = + isRawModeSupported && + lang !== null && + !focus.detailOpen && + !hasPending && + hasActions && + promptIsEmpty useInput( - (_, key) => { + (input, key) => { if (key.pageUp) scrollUp() else if (key.pageDown) scrollDown() - else if (key.ctrl && _ === 'e') scrollToBottom() + else if (key.ctrl && input === 'e') scrollToBottom() + else if (cardKeysActive && key.tab && key.shift) focus.cycleBack() + else if (cardKeysActive && key.tab) focus.cycle() + else if (cardKeysActive && key.return) focus.open() }, - { isActive: isRawModeSupported && lang !== null }, + { isActive: isRawModeSupported && lang !== null && !focus.detailOpen }, ) + // Detail view : modal full-screen replacement. + if (focus.detailOpen && focus.focusedId !== null) { + const action = state.actions.find((a) => a.id === focus.focusedId) + if (action) { + return + } + } + return ( - {hasActions ? : } + {hasActions ? ( + + ) : ( + + )} {/* Spacer pushes Welcome to the bottom AND parks the provider logo at the bottom-right of the top zone (just above the Welcome diff --git a/packages/cli/src/components/CardDetail.tsx b/packages/cli/src/components/CardDetail.tsx new file mode 100644 index 0000000..986480f --- /dev/null +++ b/packages/cli/src/components/CardDetail.tsx @@ -0,0 +1,157 @@ +// Full-screen detail view for a single Mission Control action. +// +// Mounted by App when useCardFocus reports detailOpen=true. Replaces +// both Mission Control AND Welcome — the user gets the entire screen +// to read the full content of the action they pressed Enter on. +// +// Scrolls line-by-line with PgUp / PgDn / arrow up/down. Esc closes. + +import { Box, Text, useInput } from 'ink' +import React, { useState } from 'react' +import type { Action, ActionStatus, RunAction, WriteAction } from '../actions/types.ts' +import { C } from '../theme/colors.ts' +import { + type HighlightedLine, + type Segment, + highlightPlain, + highlightYamlText, +} from './syntax.ts' + +const STATUS_LABEL: Record = { + proposed: 'PROPOSED', + approved: 'APPROVED', + running: 'RUNNING', + done: 'DONE', + failed: 'FAILED', + declined: 'DECLINED', +} + +const STATUS_COLOR: Record = { + proposed: C.orange, + approved: C.orangeBright, + running: C.yellow, + done: C.green, + failed: C.red, + declined: C.grey, +} + +function buildLines(action: Action): HighlightedLine[] { + if (action.kind === 'write') { + return highlightYamlText(action.content) + } + // run : prompt then output + const out: HighlightedLine[] = [] + out.push([{ text: '── prompt ──', color: C.grey, dim: true }]) + out.push(...highlightPlain(action.prompt)) + out.push([{ text: '' }]) + out.push([{ text: '── output ──', color: C.grey, dim: true }]) + if (action.output.length > 0) { + out.push(...highlightPlain(action.output)) + } else { + out.push([{ text: '(empty)', color: C.grey, dim: true }]) + } + if (action.status === 'failed' && action.error) { + out.push([{ text: '' }]) + out.push([{ text: `✗ ${action.error}`, color: C.red }]) + } + return out +} + +function headerFor(action: Action): string { + if (action.kind === 'write') return `write ${action.path}` + return `run ${action.agent}` +} + +export function CardDetail({ + action, + onClose, +}: { + action: Action + onClose: () => void +}): React.JSX.Element { + const rows = process.stdout.rows ?? 30 + const cols = process.stdout.columns ?? 80 + const lines = buildLines(action) + + // Reserve : 2 rows for the title bar, 2 rows for the footer hint, 1 + // separator. Body gets the rest. + const bodyHeight = Math.max(5, rows - 5) + const [offset, setOffset] = useState(0) + const maxOffset = Math.max(0, lines.length - bodyHeight) + + useInput((input, key) => { + if (key.escape || input === 'q') { + onClose() + return + } + if (key.pageUp) setOffset((o) => Math.max(0, o - bodyHeight)) + else if (key.pageDown) setOffset((o) => Math.min(maxOffset, o + bodyHeight)) + else if (key.upArrow) setOffset((o) => Math.max(0, o - 1)) + else if (key.downArrow) setOffset((o) => Math.min(maxOffset, o + 1)) + else if (input === 'g') setOffset(0) + else if (input === 'G') setOffset(maxOffset) + }) + + const visible = lines.slice(offset, offset + bodyHeight) + const totalLines = lines.length + const lastShown = Math.min(totalLines, offset + bodyHeight) + + return ( + + {/* Title bar */} + + + {`[${STATUS_LABEL[action.status]}]`} + + + {' detail '} + + {headerFor(action)} + + + {'─'.repeat(cols)} + + + {/* Body */} + + {visible.map((segments: HighlightedLine, i: number) => { + const lineNo = offset + i + 1 + return ( + + + {`${lineNo.toString().padStart(4, ' ')} `} + + {segments.map((seg: Segment, j: number) => ( + + {seg.text} + + ))} + + ) + })} + + + {/* Footer */} + + {'─'.repeat(cols)} + + + + + {`lines ${(offset + 1).toString()}..${lastShown.toString()} of ${totalLines.toString()}`} + + + + + {'[↑↓ / PgUp/PgDn] scroll [g/G] top/bottom [Esc / q] close'} + + + + + ) +} diff --git a/packages/cli/src/components/MissionControl.tsx b/packages/cli/src/components/MissionControl.tsx index 9edc30f..0dbd0d3 100644 --- a/packages/cli/src/components/MissionControl.tsx +++ b/packages/cli/src/components/MissionControl.tsx @@ -84,7 +84,8 @@ function StatusBadge({ status }: { status: ActionStatus }): React.JSX.Element { ) } -function borderColorFor(status: ActionStatus): string { +function borderColorFor(status: ActionStatus, focused: boolean): string { + if (focused) return C.orangeBright if (status === 'done') return C.green if (status === 'failed') return C.red if (status === 'declined') return C.grey @@ -94,16 +95,18 @@ function borderColorFor(status: ActionStatus): string { function CardFrame({ status, + focused, children, }: { status: ActionStatus + focused: boolean children: React.ReactNode }): React.JSX.Element { return ( + {focused ? '▸ ' : ' '} + + ) +} + +function WriteCard({ + action, + focused, +}: { + action: WriteAction + focused: boolean +}): React.JSX.Element { const lines = highlightYamlText(action.content) return ( - + + {' write '} {action.path} @@ -140,12 +158,19 @@ function WriteCard({ action }: { action: WriteAction }): React.JSX.Element { ) } -function RunCard({ action }: { action: RunAction }): React.JSX.Element { +function RunCard({ + action, + focused, +}: { + action: RunAction + focused: boolean +}): React.JSX.Element { const promptLines = highlightPlain(action.prompt) const outputLines = action.output.length > 0 ? highlightPlain(action.output) : [] return ( - + + {' run '} {action.agent} @@ -173,8 +198,10 @@ function RunCard({ action }: { action: RunAction }): React.JSX.Element { export function MissionControl({ actions, + focusedId, }: { actions: Action[] + focusedId: string | null }): React.JSX.Element { const cols = process.stdout.columns ?? 80 return ( @@ -191,14 +218,24 @@ export function MissionControl({ {` ${actions.length.toString()} action${actions.length === 1 ? '' : 's'}`} + {focusedId === null ? ( + + {' [Tab] focus a card · [Enter] open detail'} + + ) : ( + + {' [Enter] open detail · [Tab/Shift+Tab] cycle'} + + )} - {actions.map((a) => - a.kind === 'write' ? ( - + {actions.map((a) => { + const focused = a.id === focusedId + return a.kind === 'write' ? ( + ) : ( - - ), - )} + + ) + })} ) } diff --git a/packages/cli/src/components/Welcome.tsx b/packages/cli/src/components/Welcome.tsx index e1dcee4..3bb709d 100644 --- a/packages/cli/src/components/Welcome.tsx +++ b/packages/cli/src/components/Welcome.tsx @@ -14,7 +14,7 @@ import { Box, Text, useApp, useStdin } from 'ink' import TextInput from 'ink-text-input' -import React, { useState } from 'react' +import React from 'react' import { getCurrentModelName } from '@agent-forge/core/builder' import { isCommand, runCommand } from '../commands.ts' import { useChatContext } from '../hooks/useChatContext.tsx' @@ -39,7 +39,6 @@ export function Welcome(): React.JSX.Element { const { lang, setLang } = useLanguage() const { exit } = useApp() const { isRawModeSupported } = useStdin() - const [input, setInput] = useState('') const { state, send, @@ -51,6 +50,8 @@ export function Welcome(): React.JSX.Element { pending, approvePending, declinePending, + promptDraft, + setPromptDraft, } = useChatContext() const hasMessages = state.messages.length > 0 || state.streaming !== null @@ -59,7 +60,7 @@ export function Welcome(): React.JSX.Element { const handleSubmit = (value: string): void => { const trimmed = value.trim() if (!trimmed || busy) return - setInput('') + setPromptDraft('') if (isCommand(trimmed)) { addSystemMessage(trimmed) @@ -116,8 +117,8 @@ export function Welcome(): React.JSX.Element { {' ❯ '} {isRawModeSupported ? ( diff --git a/packages/cli/src/hooks/useCardFocus.ts b/packages/cli/src/hooks/useCardFocus.ts new file mode 100644 index 0000000..a40bd4a --- /dev/null +++ b/packages/cli/src/hooks/useCardFocus.ts @@ -0,0 +1,85 @@ +// Mission Control card focus + detail view state. +// +// Kept separate from useChat so the chat hook stays focused on +// conversation/action state. Exposes : +// - focusedId : id of the action currently highlighted (or null) +// - detailOpen : whether the full-screen detail panel is mounted +// - cycle / cycleBack / open / close : the actions wired to Tab keys +// +// Behaviour : +// - Tab from "no focus" → focus the LAST action (most recent on top +// of Mission Control reads as bottom of the list, so we land on +// what the user just saw). +// - Tab again → walk forward; wraps around. +// - Shift+Tab → walk backward; wraps around. +// - When the focused action disappears (cleared, etc.), focus resets. + +import { useCallback, useEffect, useState } from 'react' +import type { Action } from '../actions/types.ts' + +export type CardFocusApi = { + focusedId: string | null + detailOpen: boolean + cycle: () => void + cycleBack: () => void + open: () => void + close: () => void + clearFocus: () => void +} + +export function useCardFocus(actions: Action[]): CardFocusApi { + const [focusedId, setFocusedId] = useState(null) + const [detailOpen, setDetailOpen] = useState(false) + + // If the focused action disappears (e.g. /clear), drop focus and the + // detail panel together so we never display a stale card. + useEffect(() => { + if (focusedId === null) return + const stillThere = actions.some((a) => a.id === focusedId) + if (!stillThere) { + setFocusedId(null) + setDetailOpen(false) + } + }, [actions, focusedId]) + + const cycle = useCallback(() => { + if (actions.length === 0) return + setFocusedId((current) => { + if (current === null) { + return actions[actions.length - 1]?.id ?? null + } + const idx = actions.findIndex((a) => a.id === current) + if (idx === -1) return actions[actions.length - 1]?.id ?? null + const next = (idx + 1) % actions.length + return actions[next]?.id ?? null + }) + }, [actions]) + + const cycleBack = useCallback(() => { + if (actions.length === 0) return + setFocusedId((current) => { + if (current === null) { + return actions[0]?.id ?? null + } + const idx = actions.findIndex((a) => a.id === current) + if (idx === -1) return actions[0]?.id ?? null + const prev = (idx - 1 + actions.length) % actions.length + return actions[prev]?.id ?? null + }) + }, [actions]) + + const open = useCallback(() => { + if (focusedId !== null) setDetailOpen(true) + }, [focusedId]) + + const close = useCallback(() => { + setDetailOpen(false) + }, []) + + const clearFocus = useCallback(() => { + setFocusedId(null) + setDetailOpen(false) + }, []) + + return { focusedId, detailOpen, cycle, cycleBack, open, close, clearFocus } +} diff --git a/packages/cli/src/hooks/useChat.ts b/packages/cli/src/hooks/useChat.ts index 199272f..13dce72 100644 --- a/packages/cli/src/hooks/useChat.ts +++ b/packages/cli/src/hooks/useChat.ts @@ -116,6 +116,8 @@ export function useChat(lang: Lang): { pending: Action | null approvePending: () => void declinePending: () => void + promptDraft: string + setPromptDraft: (value: string) => void } { const [state, setState] = useState({ messages: [], @@ -125,6 +127,13 @@ export function useChat(lang: Lang): { }) const [busy, setBusy] = useState(false) const [scrollOffset, setScrollOffset] = useState(0) + // Lifted out of Welcome so App can know when the input is empty (and + // thus capture Tab for Mission Control focus without stealing keys + // from the prompt). + const [promptDraft, setPromptDraftState] = useState('') + const setPromptDraft = useCallback((value: string) => { + setPromptDraftState(value) + }, []) // Buffer des messages cachés mais toujours envoyés au LLM dans `send`. // `/clear` y déplace les messages visibles (vue vide, contexte préservé) ; // `/reset` le purge. Stocké en ref pour ne pas redéclencher de rendu. @@ -366,5 +375,7 @@ export function useChat(lang: Lang): { pending: headPending, approvePending, declinePending, + promptDraft, + setPromptDraft, } } From 3bd47c5ce804834d78f3305f6a35e3baaef21ced Mon Sep 17 00:00:00 2001 From: Georges Garnier Date: Mon, 27 Apr 2026 14:35:11 +0200 Subject: [PATCH 05/11] feat(cli): Esc clears the Mission Control card focus MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When a card is focused but the detail view isn't open, pressing Esc now drops the focus without opening anything. Guarded so it only fires when the prompt is empty and no permission dialog is up — Esc keeps its meaning everywhere else. Header hint updated accordingly. --- packages/cli/src/components/App.tsx | 11 +++++++++++ packages/cli/src/components/MissionControl.tsx | 2 +- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/packages/cli/src/components/App.tsx b/packages/cli/src/components/App.tsx index fbfe321..c85a8c9 100644 --- a/packages/cli/src/components/App.tsx +++ b/packages/cli/src/components/App.tsx @@ -55,6 +55,17 @@ export function App(): React.JSX.Element { else if (cardKeysActive && key.tab && key.shift) focus.cycleBack() else if (cardKeysActive && key.tab) focus.cycle() else if (cardKeysActive && key.return) focus.open() + // Esc clears the card focus (only when something is focused and + // the prompt is empty, so we never swallow an Esc the user meant + // for cancelling input). + else if ( + key.escape && + promptIsEmpty && + !hasPending && + focus.focusedId !== null + ) { + focus.clearFocus() + } }, { isActive: isRawModeSupported && lang !== null && !focus.detailOpen }, ) diff --git a/packages/cli/src/components/MissionControl.tsx b/packages/cli/src/components/MissionControl.tsx index 0dbd0d3..999b350 100644 --- a/packages/cli/src/components/MissionControl.tsx +++ b/packages/cli/src/components/MissionControl.tsx @@ -224,7 +224,7 @@ export function MissionControl({ ) : ( - {' [Enter] open detail · [Tab/Shift+Tab] cycle'} + {' [Enter] open detail · [Tab/Shift+Tab] cycle · [Esc] unfocus'} )} From a5c5a328a6ec2f55d60cc4b05aafc0ec5fba552e Mon Sep 17 00:00:00 2001 From: Georges Garnier Date: Mon, 27 Apr 2026 14:51:26 +0200 Subject: [PATCH 06/11] =?UTF-8?q?feat(p6):=20skill=20layer=20=E2=80=94=20h?= =?UTF-8?q?igh-level=20orchestration=20patterns=20for=20the=20builder?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The builder now has access to a catalog of skills : self-contained behaviour modules that orchestrate multiple actions in a single turn to handle recurring intent patterns. First built-in : scaffold-and-run, which fixes the "user describes both creation and execution but the builder stops after writing AGENT.md" pattern by making the LLM emit a forge:write AND a forge:run in the same turn. Architecture : - SKILL.md format with YAML frontmatter (name, description, triggers, actions) and a markdown body containing the instructions. Mirrors AGENT.md to stay familiar. - Catalog loader discovers skills from two sources : built-ins shipped in packages/core/src/builder/skills/, plus user skills under ~/.agent-forge/skills/. User skills override built-ins on name collision. - System prompt only carries the catalog metadata (name + description + triggers) — bodies stay out of the context until the LLM emits a forge:skill block, then the resolved body is injected as a system message for the next turn. - New ParsedAction kind 'skill' with a forge:skill fenced block parser ; tolerant of either `name: ` or a bare line. - SkillAction joins WriteAction/RunAction in the action store. Skills auto-execute (no permission dialog) and surface as their own card in Mission Control with a "loaded into context" hint. CardDetail renders the description plus the full body so the user can see what the skill actually injects. Other : - /skills slash command lists what's available, source-tagged. - useChat resolves the catalog once via useMemo, threads it to streamBuilder and to executeAction's resolveSkill. Tests : - SKILL.md schema (kebab-case name, missing frontmatter, unknown action tag). - Catalog loader discovers the built-in and sorts entries. - System prompt injects the SKILLS section when entries are provided, FR/EN headers, base prompt unchanged when empty. - forge:skill parser (name: prefix, bare line, kebab-case rejection) and executeAction(skill) round-trip via resolver. --- packages/cli/src/actions/types.ts | 18 ++- packages/cli/src/builder-actions.ts | 88 +++++++++++-- packages/cli/src/commands.ts | 23 ++++ packages/cli/src/components/CardDetail.tsx | 20 ++- .../cli/src/components/MissionControl.tsx | 48 ++++++- packages/cli/src/hooks/useChat.ts | 118 ++++++++++++++++-- packages/cli/tests/builder-actions.test.ts | 58 +++++++++ packages/core/src/builder/index.ts | 11 +- packages/core/src/builder/skill-catalog.ts | 114 +++++++++++++++++ .../src/builder/skills/scaffold-and-run.md | 59 +++++++++ packages/core/src/builder/stream.ts | 19 ++- packages/core/src/builder/system-prompt.ts | 64 +++++++++- packages/core/src/types/index.ts | 10 ++ packages/core/src/types/skill-md.ts | 88 +++++++++++++ packages/core/tests/skill-catalog.test.ts | 27 ++++ packages/core/tests/skill-md.test.ts | 65 ++++++++++ packages/core/tests/system-prompt.test.ts | 38 ++++++ 17 files changed, 831 insertions(+), 37 deletions(-) create mode 100644 packages/core/src/builder/skill-catalog.ts create mode 100644 packages/core/src/builder/skills/scaffold-and-run.md create mode 100644 packages/core/src/types/skill-md.ts create mode 100644 packages/core/tests/skill-catalog.test.ts create mode 100644 packages/core/tests/skill-md.test.ts create mode 100644 packages/core/tests/system-prompt.test.ts diff --git a/packages/cli/src/actions/types.ts b/packages/cli/src/actions/types.ts index 237ee06..c8b2dee 100644 --- a/packages/cli/src/actions/types.ts +++ b/packages/cli/src/actions/types.ts @@ -34,7 +34,23 @@ export type RunAction = { error?: string } -export type Action = WriteAction | RunAction +// Skill actions don't go through the permission dialog : loading a +// skill is read-only and instant. We still surface them as cards so the +// user sees, in Mission Control, that the builder is operating on a +// recognised pattern instead of free-styling. +export type SkillAction = { + id: string + kind: 'skill' + status: ActionStatus + skill: string + description: string // copied from the catalog at load time + body?: string // populated when status becomes 'done' + createdAt: string + finishedAt?: string + error?: string +} + +export type Action = WriteAction | RunAction | SkillAction let counter = 0 export function nextActionId(): string { diff --git a/packages/cli/src/builder-actions.ts b/packages/cli/src/builder-actions.ts index 889ce4c..829d432 100644 --- a/packages/cli/src/builder-actions.ts +++ b/packages/cli/src/builder-actions.ts @@ -1,7 +1,7 @@ // Parser + executor for the text-structured action protocol the builder // emits (see packages/core/src/builder/system-prompt.ts). // -// Two block types are recognized : +// Three block types are recognized : // // ```forge:write // path: @@ -15,6 +15,10 @@ // // ``` // +// ```forge:skill +// name: +// ``` +// // The closing fence is optional (small models sometimes forget the trailing // ```). When present, content stops there ; otherwise it extends to the // end of the message. @@ -22,10 +26,10 @@ import { parseAgentMd } from '@agent-forge/core/types' import { executeFileWrite } from '@agent-forge/tools-core' -const FENCE_OPEN = /```forge:(write|run)\s*\n/g +const FENCE_OPEN = /```forge:(write|run|skill)\s*\n/g // Pattern used to strip whole forge:* blocks (open + body + optional close) // from the assistant text so the chat transcript stays prose-only. -const FENCE_BLOCK = /```forge:(?:write|run)\s*\n[\s\S]*?(?:\n```|$)/g +const FENCE_BLOCK = /```forge:(?:write|run|skill)\s*\n[\s\S]*?(?:\n```|$)/g /** Remove every forge:write / forge:run block from a builder reply. * Used to keep the chat transcript free of action code — actions live in @@ -48,7 +52,13 @@ export type ParsedRunAction = { raw: string } -export type ParsedAction = ParsedWriteAction | ParsedRunAction +export type ParsedSkillAction = { + kind: 'skill' + skill: string + raw: string +} + +export type ParsedAction = ParsedWriteAction | ParsedRunAction | ParsedSkillAction export type ActionParseResult = | { ok: true; action: ParsedAction } @@ -108,19 +118,42 @@ function parseRun(inner: string, raw: string): ActionParseResult { return { ok: true, action: { kind: 'run', agent, prompt, raw } } } +function parseSkill(inner: string, raw: string): ActionParseResult { + // forge:skill expects a single key=value pair, optionally followed by a + // closing fence. Accept both `name: scaffold-and-run` (the documented + // form) and a bare line containing the name only — small models slip. + const firstLine = (inner.split('\n')[0] ?? '').trim() + const candidate = firstLine.startsWith('name:') + ? firstLine.slice('name:'.length).trim() + : firstLine + if (candidate.length === 0) { + return { ok: false, error: 'forge:skill block missing skill name', raw } + } + if (!/^[a-z][a-z0-9-]*$/.test(candidate)) { + return { + ok: false, + error: `forge:skill name must be kebab-case (got "${candidate}")`, + raw, + } + } + return { ok: true, action: { kind: 'skill', skill: candidate, raw } } +} + export function findActionBlocks(text: string): ActionParseResult[] { const out: ActionParseResult[] = [] const matches = [...text.matchAll(FENCE_OPEN)] for (let i = 0; i < matches.length; i++) { const m = matches[i] if (!m) continue - const kind = m[1] as 'write' | 'run' + const kind = m[1] as 'write' | 'run' | 'skill' const start = (m.index ?? 0) + m[0].length const closingIdx = text.indexOf('\n```', start) const end = closingIdx >= 0 ? closingIdx : text.length const inner = text.slice(start, end).replace(/\s+$/, '') const raw = text.slice(m.index ?? 0, end + (closingIdx >= 0 ? 4 : 0)) - out.push(kind === 'write' ? parseWrite(inner, raw) : parseRun(inner, raw)) + if (kind === 'write') out.push(parseWrite(inner, raw)) + else if (kind === 'run') out.push(parseRun(inner, raw)) + else out.push(parseSkill(inner, raw)) } return out } @@ -142,7 +175,18 @@ export type RunActionExecution = { result: { ok: false; error: string } | { ok: true } } -export type ActionExecution = WriteActionExecution | RunActionExecution +export type SkillActionExecution = { + kind: 'skill' + skill: string + // Skills are read-only : loading one cannot fail at exec time besides + // "skill not found in the catalog". The catalog is enforced upstream. + result: { ok: true; body: string } | { ok: false; error: string } +} + +export type ActionExecution = + | WriteActionExecution + | RunActionExecution + | SkillActionExecution function quoteUnsafeDescription(content: string): string { // Small models commonly write a `description` value containing a colon @@ -211,19 +255,47 @@ function looksLikeAgent(path: string): boolean { return path.startsWith('agents/') } +export type ExecuteActionOptions = { + overwrite?: boolean + // Resolver injected by useChat. Returns the skill body when the LLM + // asked to load a skill. We don't import the catalog here directly to + // keep this module testable without filesystem dependencies. + resolveSkill?: (name: string) => string | null +} + /** * Synchronously prepare and (for write) execute a parsed action. * For run actions, only validates pre-conditions ; the actual launch is * driven by useChat via launchAgent() so output can be streamed. + * For skill actions, looks up the skill body via the resolver. */ export function executeAction( action: ParsedAction, - options: { overwrite?: boolean } = {}, + options: ExecuteActionOptions = {}, ): ActionExecution { if (action.kind === 'run') { return { kind: 'run', agent: action.agent, result: { ok: true } } } + if (action.kind === 'skill') { + if (!options.resolveSkill) { + return { + kind: 'skill', + skill: action.skill, + result: { ok: false, error: 'no skill resolver configured' }, + } + } + const body = options.resolveSkill(action.skill) + if (body === null) { + return { + kind: 'skill', + skill: action.skill, + result: { ok: false, error: `skill not found : ${action.skill}` }, + } + } + return { kind: 'skill', skill: action.skill, result: { ok: true, body } } + } + const path = normalizeWritePath(action.path) let content = action.content diff --git a/packages/cli/src/commands.ts b/packages/cli/src/commands.ts index fb03232..ef214ad 100644 --- a/packages/cli/src/commands.ts +++ b/packages/cli/src/commands.ts @@ -5,6 +5,7 @@ import { getCurrentBaseURL, getCurrentModelName, + loadSkillCatalog, setProviderOverride, } from '@agent-forge/core/builder' import { @@ -55,6 +56,11 @@ function helpLines(lang: Lang): string[] { ` /sessions ${ lang === 'fr' ? 'liste les sessions persistées' : 'list persisted sessions' }`, + ` /skills ${ + lang === 'fr' + ? 'liste les skills disponibles' + : 'list available skills' + }`, ] } @@ -181,6 +187,23 @@ export function runCommand( return { lines } } + case '/skills': { + const catalog = loadSkillCatalog() + if (catalog.skills.length === 0) { + return { lines: [lang === 'fr' ? '(aucune skill)' : '(no skills)'] } + } + const lines = [ + lang === 'fr' + ? `${catalog.skills.length.toString()} skill(s) :` + : `${catalog.skills.length.toString()} skill(s) :`, + ] + for (const s of catalog.skills) { + const tag = s.source === 'builtin' ? '·built-in·' : '·user·' + lines.push(` ${s.name} ${tag} ${s.description}`) + } + return { lines } + } + default: return { lines: [t('cmdUnknown', lang)] } } diff --git a/packages/cli/src/components/CardDetail.tsx b/packages/cli/src/components/CardDetail.tsx index 986480f..2f0fa56 100644 --- a/packages/cli/src/components/CardDetail.tsx +++ b/packages/cli/src/components/CardDetail.tsx @@ -8,7 +8,7 @@ import { Box, Text, useInput } from 'ink' import React, { useState } from 'react' -import type { Action, ActionStatus, RunAction, WriteAction } from '../actions/types.ts' +import type { Action, ActionStatus } from '../actions/types.ts' import { C } from '../theme/colors.ts' import { type HighlightedLine, @@ -39,6 +39,23 @@ function buildLines(action: Action): HighlightedLine[] { if (action.kind === 'write') { return highlightYamlText(action.content) } + if (action.kind === 'skill') { + const out: HighlightedLine[] = [] + out.push([{ text: '── description ──', color: C.grey, dim: true }]) + out.push(...highlightPlain(action.description)) + out.push([{ text: '' }]) + out.push([{ text: '── instructions injected into context ──', color: C.grey, dim: true }]) + if (action.body && action.body.length > 0) { + out.push(...highlightPlain(action.body)) + } else { + out.push([{ text: '(skill body not loaded yet)', color: C.grey, dim: true }]) + } + if (action.status === 'failed' && action.error) { + out.push([{ text: '' }]) + out.push([{ text: `✗ ${action.error}`, color: C.red }]) + } + return out + } // run : prompt then output const out: HighlightedLine[] = [] out.push([{ text: '── prompt ──', color: C.grey, dim: true }]) @@ -59,6 +76,7 @@ function buildLines(action: Action): HighlightedLine[] { function headerFor(action: Action): string { if (action.kind === 'write') return `write ${action.path}` + if (action.kind === 'skill') return `skill ${action.skill}` return `run ${action.agent}` } diff --git a/packages/cli/src/components/MissionControl.tsx b/packages/cli/src/components/MissionControl.tsx index 999b350..d266aa2 100644 --- a/packages/cli/src/components/MissionControl.tsx +++ b/packages/cli/src/components/MissionControl.tsx @@ -10,7 +10,13 @@ import { Box, Text } from 'ink' import React from 'react' -import type { Action, ActionStatus, RunAction, WriteAction } from '../actions/types.ts' +import type { + Action, + ActionStatus, + RunAction, + SkillAction, + WriteAction, +} from '../actions/types.ts' import { C } from '../theme/colors.ts' import { type HighlightedLine, @@ -196,6 +202,38 @@ function RunCard({ ) } +function SkillCard({ + action, + focused, +}: { + action: SkillAction + focused: boolean +}): React.JSX.Element { + return ( + + + + + {' skill '} + {action.skill} + + + {action.description} + + {action.status === 'done' ? ( + + {' ✓ skill loaded into context'} + + ) : null} + {action.status === 'failed' && action.error ? ( + + {` ✗ ${action.error}`} + + ) : null} + + ) +} + export function MissionControl({ actions, focusedId, @@ -230,11 +268,9 @@ export function MissionControl({ {actions.map((a) => { const focused = a.id === focusedId - return a.kind === 'write' ? ( - - ) : ( - - ) + if (a.kind === 'write') return + if (a.kind === 'run') return + return })} ) diff --git a/packages/cli/src/hooks/useChat.ts b/packages/cli/src/hooks/useChat.ts index 13dce72..12d67e3 100644 --- a/packages/cli/src/hooks/useChat.ts +++ b/packages/cli/src/hooks/useChat.ts @@ -10,12 +10,17 @@ // Builder code blocks (```forge:*) are extracted into actions and STRIPPED // from the assistant's textual reply before that reply lands in `messages`. -import { type ChatMessage, streamBuilder } from '@agent-forge/core/builder' +import { + type ChatMessage, + loadSkillCatalog, + streamBuilder, +} from '@agent-forge/core/builder' import { launchAgent } from '@agent-forge/tools-core' -import { useCallback, useRef, useState } from 'react' +import { useCallback, useMemo, useRef, useState } from 'react' import { type Action, type RunAction, + type SkillAction, type WriteAction, nextActionId, } from '../actions/types.ts' @@ -63,7 +68,10 @@ function nowIso(): string { return new Date().toISOString() } -function actionFromParsed(parsed: ParsedAction): Action { +function actionFromParsed( + parsed: ParsedAction, + skillDescriptionFor: (name: string) => string, +): Action { if (parsed.kind === 'write') { return { id: nextActionId(), @@ -74,14 +82,25 @@ function actionFromParsed(parsed: ParsedAction): Action { createdAt: nowIso(), } } + if (parsed.kind === 'run') { + return { + id: nextActionId(), + kind: 'run', + status: 'proposed', + agent: parsed.agent, + prompt: parsed.prompt, + createdAt: nowIso(), + output: '', + } + } + // skill : auto-running, the executor resolves the body synchronously. return { id: nextActionId(), - kind: 'run', - status: 'proposed', - agent: parsed.agent, - prompt: parsed.prompt, + kind: 'skill', + status: 'running', + skill: parsed.skill, + description: skillDescriptionFor(parsed.skill), createdAt: nowIso(), - output: '', } } @@ -94,10 +113,17 @@ function parsedFromAction(action: Action): ParsedAction { raw: '', } } + if (action.kind === 'run') { + return { + kind: 'run', + agent: action.agent, + prompt: action.prompt, + raw: '', + } + } return { - kind: 'run', - agent: action.agent, - prompt: action.prompt, + kind: 'skill', + skill: action.skill, raw: '', } } @@ -127,6 +153,28 @@ export function useChat(lang: Lang): { }) const [busy, setBusy] = useState(false) const [scrollOffset, setScrollOffset] = useState(0) + // Skill catalog : loaded once at hook init, kept in a memo so callbacks + // get a stable reference. Built-ins ship with the package ; users can + // drop SKILL.md into ~/.agent-forge/skills/ to extend. + const skillCatalog = useMemo(() => loadSkillCatalog(), []) + const skillEntries = useMemo( + () => + skillCatalog.skills.map((s) => ({ + name: s.name, + description: s.description, + triggers: s.triggers, + })), + [skillCatalog], + ) + const resolveSkillBody = useCallback( + (name: string): string | null => skillCatalog.byName.get(name)?.body ?? null, + [skillCatalog], + ) + const skillDescriptionFor = useCallback( + (name: string): string => + skillCatalog.byName.get(name)?.description ?? '(unknown skill)', + [skillCatalog], + ) // Lifted out of Welcome so App can know when the input is empty (and // thus capture Tab for Mission Control focus without stealing keys // from the prompt). @@ -305,7 +353,11 @@ export function useChat(lang: Lang): { ] let acc = '' - for await (const chunk of streamBuilder({ messages: history, lang })) { + for await (const chunk of streamBuilder({ + messages: history, + lang, + skills: skillEntries, + })) { acc += chunk setState((prev) => prev.streaming @@ -318,6 +370,10 @@ export function useChat(lang: Lang): { const blocks = findActionBlocks(acc) const parseErrors: ChatTurn[] = [] const newActions: Action[] = [] + // Skill bodies executed inline get appended to the assistant turn + // as a system message so the next builder turn sees the full + // instructions. + const skillSystemTurns: ChatTurn[] = [] for (const block of blocks) { if (!block.ok) { parseErrors.push({ @@ -325,8 +381,42 @@ export function useChat(lang: Lang): { role: 'system', content: `✗ action skipped : ${block.error}`, }) + continue + } + const action = actionFromParsed(block.action, skillDescriptionFor) + if (action.kind === 'skill') { + // Resolve synchronously and finalise the card state in the + // same render — skills are local, free, never partial. + const exec = executeAction(block.action, { + resolveSkill: resolveSkillBody, + }) + if (exec.kind === 'skill' && exec.result.ok) { + const finalised: SkillAction = { + ...action, + status: 'done', + body: exec.result.body, + finishedAt: nowIso(), + } + newActions.push(finalised) + skillSystemTurns.push({ + id: nextId(), + role: 'system', + content: `[skill:${action.skill}] ${exec.result.body}`, + }) + } else { + const err = + exec.kind === 'skill' && !exec.result.ok + ? exec.result.error + : 'unknown error' + newActions.push({ + ...action, + status: 'failed', + error: err, + finishedAt: nowIso(), + }) + } } else { - newActions.push(actionFromParsed(block.action)) + newActions.push(action) } } const proseOnly = stripActionBlocks(acc) @@ -336,12 +426,14 @@ export function useChat(lang: Lang): { } persist(finalAssistant) for (const e of parseErrors) persist(e) + for (const s of skillSystemTurns) persist(s) setState((prev) => ({ ...prev, messages: [ ...prev.messages, ...(proseOnly.length > 0 ? [finalAssistant] : []), ...parseErrors, + ...skillSystemTurns, ], streaming: null, error: null, diff --git a/packages/cli/tests/builder-actions.test.ts b/packages/cli/tests/builder-actions.test.ts index 18259cf..384ab59 100644 --- a/packages/cli/tests/builder-actions.test.ts +++ b/packages/cli/tests/builder-actions.test.ts @@ -136,6 +136,64 @@ prompt }) }) +describe('findActionBlocks (skill)', () => { + test('parses a forge:skill block with name: prefix', () => { + const md = `OK je charge une skill : + +\`\`\`forge:skill +name: scaffold-and-run +\`\`\`` + const blocks = findActionBlocks(md) + expect(blocks.length).toBe(1) + expect(blocks[0]?.ok).toBe(true) + if (blocks[0]?.ok && blocks[0].action.kind === 'skill') { + expect(blocks[0].action.skill).toBe('scaffold-and-run') + } + }) + + test('parses a forge:skill block with bare name', () => { + const md = `\`\`\`forge:skill +scaffold-and-run +\`\`\`` + const blocks = findActionBlocks(md) + expect(blocks[0]?.ok).toBe(true) + if (blocks[0]?.ok && blocks[0].action.kind === 'skill') { + expect(blocks[0].action.skill).toBe('scaffold-and-run') + } + }) + + test('rejects skill with non-kebab-case name', () => { + const md = `\`\`\`forge:skill +name: ScaffoldAndRun +\`\`\`` + const blocks = findActionBlocks(md) + expect(blocks[0]?.ok).toBe(false) + }) + + test('executeAction(skill) resolves the body via the resolver', () => { + const exec = executeAction( + { kind: 'skill', skill: 'scaffold-and-run', raw: '' }, + { resolveSkill: (name) => (name === 'scaffold-and-run' ? 'BODY' : null) }, + ) + expect(exec.kind).toBe('skill') + if (exec.kind === 'skill') { + expect(exec.result.ok).toBe(true) + if (exec.result.ok) expect(exec.result.body).toBe('BODY') + } + }) + + test('executeAction(skill) errors when resolver returns null', () => { + const exec = executeAction( + { kind: 'skill', skill: 'unknown', raw: '' }, + { resolveSkill: () => null }, + ) + expect(exec.kind).toBe('skill') + if (exec.kind === 'skill') { + expect(exec.result.ok).toBe(false) + } + }) +}) + describe('executeAction (path coercion + agent validation)', () => { const validFrontmatter = `--- name: ${TEST_AGENT} diff --git a/packages/core/src/builder/index.ts b/packages/core/src/builder/index.ts index 696cea8..21af6a2 100644 --- a/packages/core/src/builder/index.ts +++ b/packages/core/src/builder/index.ts @@ -8,4 +8,13 @@ export { type ProviderConfig, } from './provider.ts' export { type ChatMessage, type ChatRole, streamBuilder } from './stream.ts' -export { type BuilderLang, getBuilderSystemPrompt } from './system-prompt.ts' +export { + type BuilderLang, + type SkillCatalogEntry, + getBuilderSystemPrompt, +} from './system-prompt.ts' +export { + loadSkillCatalog, + type SkillCatalog, + type SkillEntry, +} from './skill-catalog.ts' diff --git a/packages/core/src/builder/skill-catalog.ts b/packages/core/src/builder/skill-catalog.ts new file mode 100644 index 0000000..db226d9 --- /dev/null +++ b/packages/core/src/builder/skill-catalog.ts @@ -0,0 +1,114 @@ +// Skill catalog — discovers SKILL.md files from two sources : +// +// 1. Built-in : packages/core/src/builder/skills/*.md, shipped with +// the package. Resolved relative to import.meta.url so it works +// both in dev (TS through Bun) and in a built bundle (the .md +// files are copied next to the runtime source). +// +// 2. User : ~/.agent-forge/skills/.md or /SKILL.md. +// Read at startup ; future revisions can add a /skills reload +// slash command. +// +// Loading is lazy in the body sense : the catalog only carries the +// metadata (name + description + triggers). The body is kept on the +// SkillEntry too, but the LLM does NOT see it until it explicitly +// emits a forge:skill block — the CLI then injects the body into +// the conversation. This avoids paying tokens for skills the user +// never triggers. + +import { existsSync, readFileSync, readdirSync, statSync } from 'node:fs' +import { homedir } from 'node:os' +import { dirname, join, resolve } from 'node:path' +import { fileURLToPath } from 'node:url' +import { + type ParsedSkillMd, + SkillMdError, + parseSkillMd, +} from '../types/skill-md.ts' + +export type SkillEntry = { + name: string + description: string + triggers: string[] + actions: ParsedSkillMd['meta']['actions'] + body: string + source: 'builtin' | 'user' + filePath: string +} + +const BUILTIN_DIR = resolve(dirname(fileURLToPath(import.meta.url)), 'skills') +const USER_DIR = join(homedir(), '.agent-forge', 'skills') + +function readSkillFile(filePath: string, source: SkillEntry['source']): SkillEntry | null { + let raw: string + try { + raw = readFileSync(filePath, 'utf8') + } catch { + return null + } + let parsed: ParsedSkillMd + try { + parsed = parseSkillMd(raw) + } catch (err) { + if (err instanceof SkillMdError) { + console.error(`✗ skill ${filePath} : ${err.message}`) + return null + } + throw err + } + return { + name: parsed.meta.name, + description: parsed.meta.description, + triggers: parsed.meta.triggers, + actions: parsed.meta.actions, + body: parsed.body, + source, + filePath, + } +} + +function collectFromDir(dir: string, source: SkillEntry['source']): SkillEntry[] { + if (!existsSync(dir)) return [] + const out: SkillEntry[] = [] + for (const entry of readdirSync(dir)) { + const full = join(dir, entry) + let st: ReturnType + try { + st = statSync(full) + } catch { + continue + } + if (st.isFile() && entry.endsWith('.md')) { + const skill = readSkillFile(full, source) + if (skill) out.push(skill) + } else if (st.isDirectory()) { + // Convention : /SKILL.md so users can group assets next to + // their skill (templates, examples, etc.). + const inner = join(full, 'SKILL.md') + if (existsSync(inner)) { + const skill = readSkillFile(inner, source) + if (skill) out.push(skill) + } + } + } + return out +} + +export type SkillCatalog = { + skills: SkillEntry[] + byName: Map +} + +export function loadSkillCatalog(): SkillCatalog { + const builtins = collectFromDir(BUILTIN_DIR, 'builtin') + const users = collectFromDir(USER_DIR, 'user') + + // User skills take precedence on name collision so users can + // override a built-in by writing their own. + const merged = new Map() + for (const s of builtins) merged.set(s.name, s) + for (const s of users) merged.set(s.name, s) + + const skills = Array.from(merged.values()).sort((a, b) => a.name.localeCompare(b.name)) + return { skills, byName: merged } +} diff --git a/packages/core/src/builder/skills/scaffold-and-run.md b/packages/core/src/builder/skills/scaffold-and-run.md new file mode 100644 index 0000000..39ecd91 --- /dev/null +++ b/packages/core/src/builder/skills/scaffold-and-run.md @@ -0,0 +1,59 @@ +--- +name: scaffold-and-run +description: When the user describes both an agent AND a concrete task to perform in the same message, propose creation AND execution in one builder turn instead of stopping after the write. +triggers: + - audite + - teste + - lance puis + - crée puis lance + - scaffolde et exécute + - audit + - test it + - then run + - create and run +actions: + - write + - run +--- + +# scaffold-and-run + +Activate this skill when the user's message describes **both** what an agent should be (its role, its tools, its workspace assumptions) **and** what it should do right now (a concrete task, mission, audit, or scenario to run once). + +When activated, you MUST : + +1. Emit a fenced ```forge:write``` block creating the AGENT.md, exactly as you would normally do. +2. In the **same turn**, immediately after, emit a fenced ```forge:run``` block targeting that same agent. The prompt inside the run block is the concrete task you extracted from the user's message — phrased as an instruction to the agent, NOT as a description of what the agent is. +3. Do NOT wait for the user to ask for the run separately. The user already gave you the full intent. +4. Do NOT mix the two blocks into one. They are two independent actions, with two independent permission dialogs. The user will approve them in order. + +Both blocks must respect their usual rules : +- `forge:write` : path `agents//AGENT.md`, full YAML frontmatter, body as system prompt for the agent. +- `forge:run` : `agent: ` matching the one you just wrote, then `---`, then the prompt. + +Example shape (do not copy literally — adapt to the user's actual request) : + +```forge:write +path: agents/code-auditor/AGENT.md +--- +--- +name: code-auditor +description: "Audits a TypeScript mini-project in /workspace." +sandbox: + image: agent-forge/base:latest + timeout: 60s +maxTurns: 8 +--- + +# code-auditor + +You are a TypeScript code auditor. Use your tools to scaffold, list, read, edit and verify. +``` + +```forge:run +agent: code-auditor +--- +Scaffold src/index.ts with two TODO functions, list workspace files, read the code, replace each `return 0` by the correct implementation, then run `node -e "require('./src/index.ts')"` to verify. Answer in French. +``` + +Keep prose minimal between the two blocks — one short sentence is enough. The cards in Mission Control are what the user will read. diff --git a/packages/core/src/builder/stream.ts b/packages/core/src/builder/stream.ts index 1722f7c..dafec65 100644 --- a/packages/core/src/builder/stream.ts +++ b/packages/core/src/builder/stream.ts @@ -7,11 +7,15 @@ // the CLI parses fenced action blocks the builder emits in plain text. See // packages/cli/src/builder-actions.ts. -import { streamText } from 'ai' +import { streamText, type CoreMessage } from 'ai' import { getBuilderModel } from './provider.ts' -import { type BuilderLang, getBuilderSystemPrompt } from './system-prompt.ts' +import { + type BuilderLang, + type SkillCatalogEntry, + getBuilderSystemPrompt, +} from './system-prompt.ts' -export type ChatRole = 'user' | 'assistant' +export type ChatRole = 'user' | 'assistant' | 'system' export type ChatMessage = { role: ChatRole @@ -21,16 +25,21 @@ export type ChatMessage = { export type StreamBuilderArgs = { messages: ChatMessage[] lang: BuilderLang + // Catalog metadata advertised to the LLM in the system prompt. + // Bodies are NOT included here — they land in the conversation only + // after the LLM emits a forge:skill block. + skills?: SkillCatalogEntry[] } export async function* streamBuilder({ messages, lang, + skills, }: StreamBuilderArgs): AsyncGenerator { const result = streamText({ model: getBuilderModel(), - system: getBuilderSystemPrompt(lang), - messages, + system: getBuilderSystemPrompt(lang, { skills }), + messages: messages as CoreMessage[], // 512 leaves room for a full forge:write block (~300 tokens) plus a // short intro sentence. Override via FORGE_MAX_TOKENS if needed. maxTokens: Number(process.env.FORGE_MAX_TOKENS ?? '512'), diff --git a/packages/core/src/builder/system-prompt.ts b/packages/core/src/builder/system-prompt.ts index 8d3d869..b680de0 100644 --- a/packages/core/src/builder/system-prompt.ts +++ b/packages/core/src/builder/system-prompt.ts @@ -142,6 +142,66 @@ ${ACTION_BLOCK_FR} Réponds toujours en français.` -export function getBuilderSystemPrompt(lang: BuilderLang): string { - return lang === 'fr' ? FR : EN +// Skill catalog metadata as injected into the system prompt. The body +// of each skill is NOT included here — it would cost too many tokens +// for skills the user never triggers. The LLM only sees the entry, +// recognises a trigger, and emits a `forge:skill` block ; the CLI +// then injects the body into the conversation as a system message, +// so the next turn carries the full skill instructions. +export type SkillCatalogEntry = { + name: string + description: string + triggers: string[] +} + +const SKILLS_HEADER_EN = ` + +AVAILABLE SKILLS : + +You have access to a catalog of skills — high-level behaviours that orchestrate multiple actions in one turn for recurring intents. To activate one, emit a fenced \`forge:skill\` block. The CLI will inject the skill's full instructions as a system message in the next turn, after which you follow them. + +\`\`\`forge:skill +name: scaffold-and-run +\`\`\` + +Choose a skill when the user's message matches its trigger phrases AND you would otherwise stop too early (e.g. only writing an AGENT.md when the user clearly also wants the agent to run with a specific task). + +Catalog : +` + +const SKILLS_HEADER_FR = ` + +SKILLS DISPONIBLES : + +Tu as accès à un catalogue de skills — des comportements de haut niveau qui orchestrent plusieurs actions dans le même tour pour des intentions récurrentes. Pour en activer une, émets un bloc \`forge:skill\` encadré. La CLI injectera les instructions complètes de la skill comme message système dans le tour suivant ; tu n'as plus qu'à les appliquer. + +\`\`\`forge:skill +name: scaffold-and-run +\`\`\` + +Choisis une skill quand le message utilisateur correspond à un de ses triggers ET que sans elle tu t'arrêterais trop tôt (par ex. n'écrire qu'un AGENT.md alors que l'utilisateur veut clairement aussi le lancer avec une tâche concrète). + +Catalogue : +` + +function renderCatalog(entries: SkillCatalogEntry[]): string { + if (entries.length === 0) return '' + return entries + .map((s) => { + const triggers = + s.triggers.length > 0 ? ` — triggers : ${s.triggers.join(', ')}` : '' + return `- ${s.name} : ${s.description}${triggers}` + }) + .join('\n') +} + +export function getBuilderSystemPrompt( + lang: BuilderLang, + options: { skills?: SkillCatalogEntry[] } = {}, +): string { + const base = lang === 'fr' ? FR : EN + const entries = options.skills ?? [] + if (entries.length === 0) return base + const header = lang === 'fr' ? SKILLS_HEADER_FR : SKILLS_HEADER_EN + return `${base}${header}${renderCatalog(entries)}` } diff --git a/packages/core/src/types/index.ts b/packages/core/src/types/index.ts index b74f28b..505de0e 100644 --- a/packages/core/src/types/index.ts +++ b/packages/core/src/types/index.ts @@ -6,3 +6,13 @@ export { type AgentMd, type ParsedAgentMd, } from './agent-md.ts' + +export { + SkillActionTagSchema, + SkillMdError, + SkillMdSchema, + parseSkillMd, + type ParsedSkillMd, + type SkillActionTag, + type SkillMd, +} from './skill-md.ts' diff --git a/packages/core/src/types/skill-md.ts b/packages/core/src/types/skill-md.ts new file mode 100644 index 0000000..76e5083 --- /dev/null +++ b/packages/core/src/types/skill-md.ts @@ -0,0 +1,88 @@ +// SKILL.md — describes a high-level builder behaviour the LLM can load +// on demand to handle a recurring intent pattern. +// +// Format : Markdown with YAML frontmatter at the top, body below. +// Example : +// +// --- +// name: scaffold-and-run +// description: When the user describes both an agent AND a concrete task in the same message, propose creation AND execution in one turn. +// triggers: +// - "audite" +// - "teste" +// - "fais que cet agent" +// - "create and run" +// actions: +// - write +// - run +// --- +// +// # scaffold-and-run +// +// When activated, you must : +// 1. Emit a forge:write block creating the AGENT.md +// 2. In the SAME turn, emit a forge:run block targeting the agent +// with a prompt that captures the user's intent +// +// The user will see two PROPOSED cards and approve them in order. +// +// Skills are loaded into the conversation lazily : the system prompt +// only carries the catalog metadata (name + description + triggers). +// The body lands in the context only after the LLM emits a +// forge:skill block, which the CLI executes by injecting the body as +// a system message. + +import { parse as parseYaml } from 'yaml' +import { z } from 'zod' + +const FRONTMATTER_RE = /^---\s*\n([\s\S]*?)\n---\s*\n?([\s\S]*)$/ + +export const SkillActionTagSchema = z.enum(['write', 'run', 'skill']) +export type SkillActionTag = z.infer + +export const SkillMdSchema = z.object({ + name: z + .string() + .min(1) + .regex(/^[a-z][a-z0-9-]*$/, 'name must be kebab-case (lowercase, digits, hyphens)'), + description: z.string().min(1), + triggers: z.array(z.string().min(1)).default([]), + actions: z.array(SkillActionTagSchema).default([]), +}) + +export type SkillMd = z.infer + +export type ParsedSkillMd = { + meta: SkillMd + body: string +} + +export class SkillMdError extends Error { + constructor(message: string, public readonly cause?: unknown) { + super(message) + this.name = 'SkillMdError' + } +} + +export function parseSkillMd(text: string): ParsedSkillMd { + const match = text.match(FRONTMATTER_RE) + if (!match) { + throw new SkillMdError( + 'SKILL.md must start with a YAML frontmatter block delimited by ---', + ) + } + const [, yamlText, body] = match + let parsedYaml: unknown + try { + parsedYaml = parseYaml(yamlText ?? '') + } catch (err) { + throw new SkillMdError('Invalid YAML in SKILL.md frontmatter', err) + } + const result = SkillMdSchema.safeParse(parsedYaml) + if (!result.success) { + const first = result.error.issues[0] + const path = first?.path.join('.') ?? '' + throw new SkillMdError(`Invalid SKILL.md : ${path} — ${first?.message ?? 'unknown error'}`) + } + return { meta: result.data, body: (body ?? '').trim() } +} diff --git a/packages/core/tests/skill-catalog.test.ts b/packages/core/tests/skill-catalog.test.ts new file mode 100644 index 0000000..fa86fd2 --- /dev/null +++ b/packages/core/tests/skill-catalog.test.ts @@ -0,0 +1,27 @@ +// Catalog loader tests : the built-in scaffold-and-run skill must be +// discoverable, parseable, and the resulting entry must carry name + +// description + triggers + body. + +import { describe, expect, test } from 'bun:test' +import { loadSkillCatalog } from '../src/builder/skill-catalog.ts' + +describe('loadSkillCatalog', () => { + test('discovers the built-in scaffold-and-run skill', () => { + const cat = loadSkillCatalog() + const s = cat.byName.get('scaffold-and-run') + expect(s).toBeDefined() + if (!s) return + expect(s.source).toBe('builtin') + expect(s.description.length).toBeGreaterThan(0) + expect(s.body.length).toBeGreaterThan(0) + expect(s.triggers.length).toBeGreaterThan(0) + expect(s.actions).toEqual(expect.arrayContaining(['write', 'run'])) + }) + + test('catalog skills are sorted by name', () => { + const cat = loadSkillCatalog() + const names = cat.skills.map((s) => s.name) + const sorted = [...names].sort((a, b) => a.localeCompare(b)) + expect(names).toEqual(sorted) + }) +}) diff --git a/packages/core/tests/skill-md.test.ts b/packages/core/tests/skill-md.test.ts new file mode 100644 index 0000000..c383601 --- /dev/null +++ b/packages/core/tests/skill-md.test.ts @@ -0,0 +1,65 @@ +// Schema and parser tests for SKILL.md. + +import { describe, expect, test } from 'bun:test' +import { SkillMdError, parseSkillMd } from '../src/types/skill-md.ts' + +describe('parseSkillMd', () => { + test('parses a minimal valid skill', () => { + const md = `--- +name: scaffold-and-run +description: Create then run in one turn. +--- + +Body goes here.` + const r = parseSkillMd(md) + expect(r.meta.name).toBe('scaffold-and-run') + expect(r.meta.description).toBe('Create then run in one turn.') + expect(r.meta.triggers).toEqual([]) + expect(r.meta.actions).toEqual([]) + expect(r.body).toBe('Body goes here.') + }) + + test('parses triggers and actions arrays', () => { + const md = `--- +name: x +description: y +triggers: + - audite + - test +actions: + - write + - run +--- + +body` + const r = parseSkillMd(md) + expect(r.meta.triggers).toEqual(['audite', 'test']) + expect(r.meta.actions).toEqual(['write', 'run']) + }) + + test('rejects a non kebab-case name', () => { + const md = `--- +name: ScaffoldAndRun +description: invalid +--- + +body` + expect(() => parseSkillMd(md)).toThrow(SkillMdError) + }) + + test('rejects missing frontmatter', () => { + expect(() => parseSkillMd('# no frontmatter')).toThrow(SkillMdError) + }) + + test('rejects an unknown action tag', () => { + const md = `--- +name: x +description: y +actions: + - bogus +--- + +body` + expect(() => parseSkillMd(md)).toThrow(SkillMdError) + }) +}) diff --git a/packages/core/tests/system-prompt.test.ts b/packages/core/tests/system-prompt.test.ts new file mode 100644 index 0000000..7ba745f --- /dev/null +++ b/packages/core/tests/system-prompt.test.ts @@ -0,0 +1,38 @@ +// System prompt — verify that the skill catalog is injected when +// provided, and that the prompt stays untouched when the catalog is +// empty. + +import { describe, expect, test } from 'bun:test' +import { getBuilderSystemPrompt } from '../src/builder/system-prompt.ts' + +describe('getBuilderSystemPrompt', () => { + test('returns the base prompt when no skills are provided', () => { + const en = getBuilderSystemPrompt('en') + expect(en).toContain('Agent Forge builder') + expect(en).not.toContain('AVAILABLE SKILLS') + }) + + test('appends a SKILLS section when entries are passed', () => { + const en = getBuilderSystemPrompt('en', { + skills: [ + { + name: 'scaffold-and-run', + description: 'Create then run.', + triggers: ['audite', 'test'], + }, + ], + }) + expect(en).toContain('AVAILABLE SKILLS') + expect(en).toContain('scaffold-and-run') + expect(en).toContain('Create then run.') + expect(en).toContain('audite, test') + }) + + test('FR variant uses French headers', () => { + const fr = getBuilderSystemPrompt('fr', { + skills: [{ name: 'x', description: 'y', triggers: [] }], + }) + expect(fr).toContain('SKILLS DISPONIBLES') + expect(fr).not.toContain('AVAILABLE SKILLS') + }) +}) From 96cb4aec07f3a4bc6f9642370a09adca94849bc1 Mon Sep 17 00:00:00 2001 From: Georges Garnier Date: Mon, 27 Apr 2026 15:31:00 +0200 Subject: [PATCH 07/11] fix(p6): make skill activation actually fire on Mistral Small MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Symptom : the user typed a message that matched a skill trigger ("audite un projet typescript") but the builder still skipped forge:skill and went straight to forge:write. The skill catalog was loaded, the system prompt mentioned it, but Mistral would not act on it. Two issues, both fixed here : 1. Position. The SKILLS section was appended AFTER the base prompt's "BE DECISIVE — propose the AGENT.md immediately" rule. Mistral read the strong push to write first, then a soft "you can also use a skill" at the bottom, and ignored the latter. 2. Framing. The original wording said "choose a skill when ...". Too permissive — small models read that as optional. Replaced with a STEP 0 / ÉTAPE 0 framing : an explicit, mandatory pre-flight check that runs BEFORE any other action. If any trigger phrase matches (case-insensitive substring), the LLM MUST emit a forge:skill block as the only action of that turn ; only then does the rest of the protocol apply. The catalog is now placed at the TOP of the system prompt, before the "be decisive" rule, so the order of reading mirrors the order of execution. Tests updated for the new wording. Triggers are now quoted in the catalog rendering ("audite", "teste") so the LLM sees them as literals rather than running prose. --- packages/core/src/builder/system-prompt.ts | 41 +++++++++++++++------- packages/core/tests/system-prompt.test.ts | 17 +++++---- 2 files changed, 39 insertions(+), 19 deletions(-) diff --git a/packages/core/src/builder/system-prompt.ts b/packages/core/src/builder/system-prompt.ts index b680de0..13f4fbe 100644 --- a/packages/core/src/builder/system-prompt.ts +++ b/packages/core/src/builder/system-prompt.ts @@ -154,34 +154,42 @@ export type SkillCatalogEntry = { triggers: string[] } -const SKILLS_HEADER_EN = ` +const SKILLS_PREAMBLE_EN = `STEP 0 — SKILL CHECK (mandatory, runs BEFORE any other action) : -AVAILABLE SKILLS : +Before doing ANYTHING else, scan the user's message for a skill trigger. The catalog below lists each skill, what it does, and the trigger phrases that activate it. If ANY trigger phrase appears in the user's message (case-insensitive, substring match counts), you MUST : -You have access to a catalog of skills — high-level behaviours that orchestrate multiple actions in one turn for recurring intents. To activate one, emit a fenced \`forge:skill\` block. The CLI will inject the skill's full instructions as a system message in the next turn, after which you follow them. +1. Emit a fenced \`forge:skill\` block as your FIRST and ONLY action of this turn. +2. Do NOT also emit forge:write or forge:run in the same turn — wait for the skill body to be injected. +3. The next turn will arrive with the skill's full instructions as a system message ; only then follow the rest of the protocol. + +Example (the user said "audite un projet typescript", "audite" is a trigger of scaffold-and-run) : \`\`\`forge:skill name: scaffold-and-run \`\`\` -Choose a skill when the user's message matches its trigger phrases AND you would otherwise stop too early (e.g. only writing an AGENT.md when the user clearly also wants the agent to run with a specific task). +Skip this step ONLY if NO trigger matches. In that case, fall through to the default protocol below. -Catalog : +Skill catalog : ` -const SKILLS_HEADER_FR = ` +const SKILLS_PREAMBLE_FR = `ÉTAPE 0 — VÉRIFICATION DE SKILL (obligatoire, AVANT toute autre action) : + +Avant TOUTE autre chose, analyse le message de l'utilisateur pour repérer un trigger de skill. Le catalogue ci-dessous liste chaque skill, ce qu'elle fait, et les phrases déclencheuses. Si UN seul trigger apparaît dans le message de l'utilisateur (insensible à la casse, sous-chaîne suffit), tu DOIS : -SKILLS DISPONIBLES : +1. Émettre un bloc \`forge:skill\` encadré comme PREMIÈRE et SEULE action de ce tour. +2. Ne PAS émettre aussi un forge:write ou un forge:run dans le même tour — attends que le corps de la skill soit injecté. +3. Au tour suivant, les instructions complètes de la skill arriveront en message système ; tu n'auras plus qu'à les suivre. -Tu as accès à un catalogue de skills — des comportements de haut niveau qui orchestrent plusieurs actions dans le même tour pour des intentions récurrentes. Pour en activer une, émets un bloc \`forge:skill\` encadré. La CLI injectera les instructions complètes de la skill comme message système dans le tour suivant ; tu n'as plus qu'à les appliquer. +Exemple (l'utilisateur dit « audite un projet typescript », « audite » est un trigger de scaffold-and-run) : \`\`\`forge:skill name: scaffold-and-run \`\`\` -Choisis une skill quand le message utilisateur correspond à un de ses triggers ET que sans elle tu t'arrêterais trop tôt (par ex. n'écrire qu'un AGENT.md alors que l'utilisateur veut clairement aussi le lancer avec une tâche concrète). +Ne passe cette étape QUE si AUCUN trigger ne matche. Dans ce cas seulement, applique le protocole par défaut ci-dessous. -Catalogue : +Catalogue de skills : ` function renderCatalog(entries: SkillCatalogEntry[]): string { @@ -189,7 +197,9 @@ function renderCatalog(entries: SkillCatalogEntry[]): string { return entries .map((s) => { const triggers = - s.triggers.length > 0 ? ` — triggers : ${s.triggers.join(', ')}` : '' + s.triggers.length > 0 + ? ` — triggers : ${s.triggers.map((t) => `"${t}"`).join(', ')}` + : '' return `- ${s.name} : ${s.description}${triggers}` }) .join('\n') @@ -202,6 +212,11 @@ export function getBuilderSystemPrompt( const base = lang === 'fr' ? FR : EN const entries = options.skills ?? [] if (entries.length === 0) return base - const header = lang === 'fr' ? SKILLS_HEADER_FR : SKILLS_HEADER_EN - return `${base}${header}${renderCatalog(entries)}` + const preamble = + lang === 'fr' ? SKILLS_PREAMBLE_FR : SKILLS_PREAMBLE_EN + // Place skills preamble BEFORE the base prompt so the LLM reads the + // skill check first. The base prompt's "be decisive, write + // immediately" rule has been pushing the model to skip skills ; this + // ordering plus the explicit STEP 0 framing fixes that. + return `${preamble}${renderCatalog(entries)}\n\n---\n\n${base}` } diff --git a/packages/core/tests/system-prompt.test.ts b/packages/core/tests/system-prompt.test.ts index 7ba745f..db6bbfd 100644 --- a/packages/core/tests/system-prompt.test.ts +++ b/packages/core/tests/system-prompt.test.ts @@ -9,10 +9,10 @@ describe('getBuilderSystemPrompt', () => { test('returns the base prompt when no skills are provided', () => { const en = getBuilderSystemPrompt('en') expect(en).toContain('Agent Forge builder') - expect(en).not.toContain('AVAILABLE SKILLS') + expect(en).not.toContain('STEP 0 — SKILL CHECK') }) - test('appends a SKILLS section when entries are passed', () => { + test('prepends a SKILL CHECK preamble when entries are passed', () => { const en = getBuilderSystemPrompt('en', { skills: [ { @@ -22,17 +22,22 @@ describe('getBuilderSystemPrompt', () => { }, ], }) - expect(en).toContain('AVAILABLE SKILLS') + expect(en).toContain('STEP 0 — SKILL CHECK') expect(en).toContain('scaffold-and-run') expect(en).toContain('Create then run.') - expect(en).toContain('audite, test') + expect(en).toContain('"audite", "test"') + // Preamble must come BEFORE the base prompt so the model reads the + // skill rule before the "be decisive, write immediately" rule. + expect(en.indexOf('STEP 0 — SKILL CHECK')).toBeLessThan( + en.indexOf('Agent Forge builder'), + ) }) test('FR variant uses French headers', () => { const fr = getBuilderSystemPrompt('fr', { skills: [{ name: 'x', description: 'y', triggers: [] }], }) - expect(fr).toContain('SKILLS DISPONIBLES') - expect(fr).not.toContain('AVAILABLE SKILLS') + expect(fr).toContain('ÉTAPE 0 — VÉRIFICATION DE SKILL') + expect(fr).not.toContain('STEP 0 — SKILL CHECK') }) }) From 02627b45e5c5c7c90dd55bdf98b518cb58e99dac Mon Sep 17 00:00:00 2001 From: Georges Garnier Date: Mon, 27 Apr 2026 15:43:13 +0200 Subject: [PATCH 08/11] fix(p6): server-side skill dispatch (small models can't be trusted) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mistral Small reads the skill catalog and the STEP 0 instruction in the system prompt, but it does not act on it : it sees "audite a typescript project" and goes straight to a forge:write that collapses both the agent definition and the run mission into one giant AGENT.md body. Adding more rules to the prompt didn't move the needle. Plan B, in three pieces : 1. matchSkillForMessage() — case-insensitive substring match against the trigger phrases declared in each SKILL.md. Lives in core, no LLM involvement. 2. runScaffoldAndRun() — a dedicated runner that drives the skill end to end with TWO narrow LLM calls instead of one wide one : - call A : "produce ONLY the AGENT.md content" (generic role, no session-specific steps in the body) - call B : "produce ONLY the prompt to send to the agent" Each call has a tightly scoped system instruction so the model keeps the two artefacts cleanly separated. Output is parsed server-side, AGENT.md name extracted from the frontmatter. 3. useChat.send() — pre-flight before the normal stream : if the matcher finds a skill, dispatch to the runner. The skill card lands in Mission Control as DONE, then a write card and a run card appear as PROPOSED. The user approves them in order via the existing permission dialog. The system prompt no longer carries the STEP 0 / ÉTAPE 0 mandate. Skills are now an internal mechanism the LLM is informed about but never asked to operate. The catalog metadata stays in the prompt as a short tail note so the LLM understands why a skill card might appear in Mission Control. Tests : - matchSkillForMessage : substring match, no-match, multi-skill precedence (first wins), empty trigger ignored. - system prompt : informational note appears when skills are passed, FR/EN variants, base prompt comes first. --- packages/cli/src/hooks/useChat.ts | 93 +++++++++++- packages/core/src/builder/index.ts | 5 + packages/core/src/builder/skill-matcher.ts | 41 ++++++ packages/core/src/builder/skill-runner.ts | 162 +++++++++++++++++++++ packages/core/src/builder/system-prompt.ts | 55 ++----- packages/core/tests/skill-matcher.test.ts | 43 ++++++ packages/core/tests/system-prompt.test.ts | 26 ++-- 7 files changed, 371 insertions(+), 54 deletions(-) create mode 100644 packages/core/src/builder/skill-matcher.ts create mode 100644 packages/core/src/builder/skill-runner.ts create mode 100644 packages/core/tests/skill-matcher.test.ts diff --git a/packages/cli/src/hooks/useChat.ts b/packages/cli/src/hooks/useChat.ts index 12d67e3..83b42c9 100644 --- a/packages/cli/src/hooks/useChat.ts +++ b/packages/cli/src/hooks/useChat.ts @@ -13,6 +13,8 @@ import { type ChatMessage, loadSkillCatalog, + matchSkillForMessage, + runScaffoldAndRun, streamBuilder, } from '@agent-forge/core/builder' import { launchAgent } from '@agent-forge/tools-core' @@ -335,6 +337,95 @@ export function useChat(lang: Lang): { })) setBusy(true) + // Server-side skill matching : if a trigger phrase appears in the + // user message, dispatch to the dedicated runner instead of the + // generic streaming flow. The runner makes two narrow LLM calls + // (one per artefact) so small models keep the AGENT.md and the + // run prompt cleanly separated. + const matched = matchSkillForMessage(prompt, skillCatalog.skills) + if (matched && matched.name === 'scaffold-and-run') { + const skillCard: SkillAction = { + id: nextActionId(), + kind: 'skill', + status: 'running', + skill: matched.name, + description: matched.description, + createdAt: nowIso(), + } + setState((prev) => ({ + ...prev, + streaming: null, + actions: [...prev.actions, skillCard], + })) + try { + const result = await runScaffoldAndRun({ + userMessage: prompt, + lang, + }) + if (!result) { + updateAction(skillCard.id, { + status: 'failed', + error: 'skill runner produced no usable output', + finishedAt: nowIso(), + }) + setBusy(false) + return + } + // Mark the skill as done and surface a write + run pair as + // proposed cards. The user approves them in order via the + // permission dialog. + updateAction(skillCard.id, { + status: 'done', + body: matched.body, + finishedAt: nowIso(), + }) + const writeCard: WriteAction = { + id: nextActionId(), + kind: 'write', + status: 'proposed', + path: `agents/${result.agentName}/AGENT.md`, + content: result.agentMdContent, + createdAt: nowIso(), + } + const runCard: RunAction = { + id: nextActionId(), + kind: 'run', + status: 'proposed', + agent: result.agentName, + prompt: result.runPrompt, + createdAt: nowIso(), + output: '', + } + // Final assistant turn : one short prose sentence so the user + // sees in the conversation that the skill fired. + const proseTurn: ChatTurn = { + id: nextId(), + role: 'assistant', + content: + lang === 'fr' + ? `Je charge la skill ${matched.name} : un AGENT.md à approuver, puis l'exécution.` + : `Loading skill ${matched.name} : one AGENT.md to approve, then the run.`, + } + persist(proseTurn) + setState((prev) => ({ + ...prev, + messages: [...prev.messages, proseTurn], + actions: [...prev.actions, writeCard, runCard], + })) + } catch (err) { + const msg = err instanceof Error ? err.message : String(err) + updateAction(skillCard.id, { + status: 'failed', + error: msg, + finishedAt: nowIso(), + }) + setState((prev) => ({ ...prev, error: msg })) + } finally { + setBusy(false) + } + return + } + try { const history: ChatMessage[] = [ ...hiddenHistoryRef.current @@ -450,7 +541,7 @@ export function useChat(lang: Lang): { setBusy(false) } }, - [state.messages, lang], + [state.messages, lang, skillCatalog, skillEntries, updateAction], ) return { diff --git a/packages/core/src/builder/index.ts b/packages/core/src/builder/index.ts index 21af6a2..915d875 100644 --- a/packages/core/src/builder/index.ts +++ b/packages/core/src/builder/index.ts @@ -18,3 +18,8 @@ export { type SkillCatalog, type SkillEntry, } from './skill-catalog.ts' +export { matchSkillForMessage } from './skill-matcher.ts' +export { + runScaffoldAndRun, + type ScaffoldAndRunResult, +} from './skill-runner.ts' diff --git a/packages/core/src/builder/skill-matcher.ts b/packages/core/src/builder/skill-matcher.ts new file mode 100644 index 0000000..a536240 --- /dev/null +++ b/packages/core/src/builder/skill-matcher.ts @@ -0,0 +1,41 @@ +// Server-side skill trigger matching. +// +// Small models (Mistral Small, MLX local) don't reliably emit +// forge:skill even when the system prompt says they MUST. Plan B : +// the CLI matches triggers itself before calling the LLM. If a +// trigger phrase appears as a substring of the user message +// (case-insensitive), the matched skill is auto-loaded : its body is +// injected into the conversation as a system message, and a +// SkillAction (status=done) is added to Mission Control. The LLM +// then sees the skill instructions as if it had asked for them +// itself, and the next turn follows the orchestration described in +// the skill body. + +import type { SkillEntry } from './skill-catalog.ts' + +/** + * Returns the FIRST skill whose triggers match the user message, or + * null if none match. Match is case-insensitive substring : we trim + * the trigger and lower-case both sides before comparing. We don't + * need a fuzzy matcher — skills define their own trigger phrases, so + * authors can list as many synonyms as they like. + * + * The first match wins because skills are sorted alphabetically in + * the catalog ; if two skills compete on a message, the first one + * lexicographically takes precedence. That's deterministic and easy + * to reason about ; we'll revisit if real conflicts appear. + */ +export function matchSkillForMessage( + message: string, + skills: SkillEntry[], +): SkillEntry | null { + const haystack = message.toLowerCase() + for (const skill of skills) { + for (const trigger of skill.triggers) { + const needle = trigger.trim().toLowerCase() + if (needle.length === 0) continue + if (haystack.includes(needle)) return skill + } + } + return null +} diff --git a/packages/core/src/builder/skill-runner.ts b/packages/core/src/builder/skill-runner.ts new file mode 100644 index 0000000..ccf3bb9 --- /dev/null +++ b/packages/core/src/builder/skill-runner.ts @@ -0,0 +1,162 @@ +// Skill runner — deterministic orchestration for skills that small +// models can't reliably handle through prompt instructions alone. +// +// Today this only knows how to drive `scaffold-and-run`. The shape is +// generic enough that other skills can plug in : each runner takes +// the user prompt, calls the LLM with a tightly scoped instruction +// (one block to produce, nothing else), and returns either the +// generated content or null on failure. The CLI assembles the +// resulting actions in Mission Control. +// +// The win over a single LLM call : Mistral Small collapses +// "what the agent is" (AGENT.md) and "what the agent should do this +// time" (forge:run prompt) into one big system prompt. Splitting the +// work into two narrow calls forces the model to keep them apart. + +import { generateText } from 'ai' +import { getBuilderModel } from './provider.ts' +import type { BuilderLang } from './system-prompt.ts' + +export type ScaffoldAndRunResult = { + agentName: string + agentMdContent: string // full AGENT.md (frontmatter + body), no fences + runPrompt: string // prompt to feed forge:run +} + +const AGENT_MD_INSTRUCTION_FR = `Tu es un assistant qui produit UNIQUEMENT le contenu d'un fichier AGENT.md, rien d'autre. + +Format obligatoire (commence par \`---\`, finis par \`---\` puis le corps) : + +--- +name: +description: "Une phrase courte décrivant le rôle GÉNÉRIQUE de l'agent (pas la mission spécifique de cette session)." +sandbox: + image: agent-forge/base:latest + timeout: 120s +maxTurns: 8 +--- + +# + +Tu es un . Décris en 2 à 4 lignes le rôle GÉNÉRIQUE de l'agent. Mentionne brièvement les outils dont il dispose (forge:bash, forge:write, forge:read, forge:edit, forge:grep, forge:glob, sandboxés sous /workspace). NE liste PAS d'étapes spécifiques à la session courante — ces étapes seront passées séparément en prompt run. + +RÈGLES STRICTES : +- Ne produis QUE le contenu du fichier AGENT.md, sans \`\`\` ni texte avant/après. +- La valeur de \`description\` ne doit JAMAIS contenir de deux-points non quoté. +- N'invente pas de section "Étapes" ou "Mission" dans le corps : elles iront dans le prompt run. +- Réponds en français.` + +const AGENT_MD_INSTRUCTION_EN = `You output ONLY the content of an AGENT.md file, nothing else. + +Required format (start with \`---\`, end with \`---\` then the body) : + +--- +name: +description: "One short sentence describing the GENERIC role of the agent (not the specific mission of this session)." +sandbox: + image: agent-forge/base:latest + timeout: 120s +maxTurns: 8 +--- + +# + +You are a . Describe the GENERIC role in 2-4 lines. Briefly mention the tools available (forge:bash, forge:write, forge:read, forge:edit, forge:grep, forge:glob, sandboxed under /workspace). Do NOT list session-specific steps — those will be passed separately as the run prompt. + +STRICT RULES : +- Output ONLY the AGENT.md content, no \`\`\` and no prose before/after. +- The \`description\` value must NEVER contain an unquoted colon. +- Do not invent a "Steps" or "Mission" section in the body : that goes in the run prompt. +- Answer in English.` + +const RUN_PROMPT_INSTRUCTION_FR = `Tu es un assistant qui produit UNIQUEMENT le prompt à envoyer à un agent, rien d'autre. + +Tu vas extraire de la demande utilisateur la MISSION CONCRÈTE à exécuter, et la reformuler comme une INSTRUCTION DIRECTE adressée à l'agent (à la 2ème personne du singulier en français : « tu vas… »). Cette instruction sera passée à l'agent via un bloc forge:run. + +RÈGLES STRICTES : +- Produis UNIQUEMENT le texte du prompt, sans \`\`\`, sans préambule, sans explication. +- Décris des étapes concrètes et exécutables (pas de méta-discours). +- Ne ré-explique PAS le rôle de l'agent, il est déjà défini dans son AGENT.md. +- Si la demande mentionne du code à scaffolder, sois explicite sur le contenu attendu. +- Termine par : « Réponds en français. »` + +const RUN_PROMPT_INSTRUCTION_EN = `You output ONLY the prompt to send to an agent, nothing else. + +You extract from the user's message the CONCRETE MISSION to execute, and rephrase it as a DIRECT INSTRUCTION to the agent (second person : "you will…"). This instruction will be passed to the agent through a forge:run block. + +STRICT RULES : +- Output ONLY the prompt text, no \`\`\`, no preamble, no explanation. +- Describe concrete executable steps (no meta-talk). +- Do NOT re-explain the role of the agent, it's already defined in its AGENT.md. +- If the user mentioned code to scaffold, be explicit about the expected content. +- End with : "Answer in English."` + +function buildAgentMdInstruction(lang: BuilderLang): string { + return lang === 'fr' ? AGENT_MD_INSTRUCTION_FR : AGENT_MD_INSTRUCTION_EN +} + +function buildRunPromptInstruction(lang: BuilderLang): string { + return lang === 'fr' ? RUN_PROMPT_INSTRUCTION_FR : RUN_PROMPT_INSTRUCTION_EN +} + +const NAME_RE = /name\s*:\s*([a-z][a-z0-9-]*)/i + +function extractAgentName(agentMd: string): string | null { + const m = NAME_RE.exec(agentMd) + return m && m[1] ? m[1] : null +} + +function stripFences(text: string): string { + // The instruction tells the model NOT to wrap output in fences, but + // small models slip — strip a leading and trailing ``` if present. + let out = text.trim() + if (out.startsWith('```')) { + const firstNl = out.indexOf('\n') + if (firstNl !== -1) out = out.slice(firstNl + 1) + } + if (out.endsWith('```')) { + out = out.slice(0, -3).trimEnd() + } + return out.trim() +} + +/** + * Drive the scaffold-and-run skill end to end. Two narrow LLM calls, + * each producing exactly one artefact. The CLI then surfaces them as + * a write action + a run action in Mission Control. + * + * Returns null if either call fails to produce a recognisable + * artefact (e.g. AGENT.md without a `name:` line). The caller falls + * back to the normal flow. + */ +export async function runScaffoldAndRun(args: { + userMessage: string + lang: BuilderLang +}): Promise { + const model = getBuilderModel() + const agentMdInstruction = buildAgentMdInstruction(args.lang) + const runPromptInstruction = buildRunPromptInstruction(args.lang) + + // Call 1 : produce the AGENT.md. + const agentMdResp = await generateText({ + model, + system: agentMdInstruction, + prompt: args.userMessage, + maxTokens: 600, + }) + const agentMdContent = stripFences(agentMdResp.text) + const agentName = extractAgentName(agentMdContent) + if (!agentName) return null + + // Call 2 : produce the run prompt. + const runResp = await generateText({ + model, + system: runPromptInstruction, + prompt: args.userMessage, + maxTokens: 400, + }) + const runPrompt = stripFences(runResp.text) + if (runPrompt.length === 0) return null + + return { agentName, agentMdContent, runPrompt } +} diff --git a/packages/core/src/builder/system-prompt.ts b/packages/core/src/builder/system-prompt.ts index 13f4fbe..882adc5 100644 --- a/packages/core/src/builder/system-prompt.ts +++ b/packages/core/src/builder/system-prompt.ts @@ -154,42 +154,22 @@ export type SkillCatalogEntry = { triggers: string[] } -const SKILLS_PREAMBLE_EN = `STEP 0 — SKILL CHECK (mandatory, runs BEFORE any other action) : - -Before doing ANYTHING else, scan the user's message for a skill trigger. The catalog below lists each skill, what it does, and the trigger phrases that activate it. If ANY trigger phrase appears in the user's message (case-insensitive, substring match counts), you MUST : - -1. Emit a fenced \`forge:skill\` block as your FIRST and ONLY action of this turn. -2. Do NOT also emit forge:write or forge:run in the same turn — wait for the skill body to be injected. -3. The next turn will arrive with the skill's full instructions as a system message ; only then follow the rest of the protocol. - -Example (the user said "audite un projet typescript", "audite" is a trigger of scaffold-and-run) : - -\`\`\`forge:skill -name: scaffold-and-run -\`\`\` - -Skip this step ONLY if NO trigger matches. In that case, fall through to the default protocol below. +// Note : skill activation is now handled SERVER-SIDE by the CLI, not +// by the LLM. Trigger matching, runner dispatch, and write+run +// orchestration all happen in TypeScript before the LLM is even +// called for the matched user message. This keeps the small models +// out of the meta-decision business and makes the orchestration +// deterministic. +// +// We still surface the skill catalog in the system prompt as a short +// informational note, so the LLM doesn't get confused when a skill +// card appears in Mission Control — it knows skills exist and that +// they were dispatched on its behalf. -Skill catalog : +const SKILLS_PREAMBLE_EN = `Skills available (auto-dispatched by the CLI when the user message matches a trigger ; you do NOT need to invoke them yourself) : ` -const SKILLS_PREAMBLE_FR = `ÉTAPE 0 — VÉRIFICATION DE SKILL (obligatoire, AVANT toute autre action) : - -Avant TOUTE autre chose, analyse le message de l'utilisateur pour repérer un trigger de skill. Le catalogue ci-dessous liste chaque skill, ce qu'elle fait, et les phrases déclencheuses. Si UN seul trigger apparaît dans le message de l'utilisateur (insensible à la casse, sous-chaîne suffit), tu DOIS : - -1. Émettre un bloc \`forge:skill\` encadré comme PREMIÈRE et SEULE action de ce tour. -2. Ne PAS émettre aussi un forge:write ou un forge:run dans le même tour — attends que le corps de la skill soit injecté. -3. Au tour suivant, les instructions complètes de la skill arriveront en message système ; tu n'auras plus qu'à les suivre. - -Exemple (l'utilisateur dit « audite un projet typescript », « audite » est un trigger de scaffold-and-run) : - -\`\`\`forge:skill -name: scaffold-and-run -\`\`\` - -Ne passe cette étape QUE si AUCUN trigger ne matche. Dans ce cas seulement, applique le protocole par défaut ci-dessous. - -Catalogue de skills : +const SKILLS_PREAMBLE_FR = `Skills disponibles (déclenchées automatiquement par la CLI quand le message utilisateur correspond à un trigger ; tu n'as PAS à les invoquer toi-même) : ` function renderCatalog(entries: SkillCatalogEntry[]): string { @@ -212,11 +192,6 @@ export function getBuilderSystemPrompt( const base = lang === 'fr' ? FR : EN const entries = options.skills ?? [] if (entries.length === 0) return base - const preamble = - lang === 'fr' ? SKILLS_PREAMBLE_FR : SKILLS_PREAMBLE_EN - // Place skills preamble BEFORE the base prompt so the LLM reads the - // skill check first. The base prompt's "be decisive, write - // immediately" rule has been pushing the model to skip skills ; this - // ordering plus the explicit STEP 0 framing fixes that. - return `${preamble}${renderCatalog(entries)}\n\n---\n\n${base}` + const preamble = lang === 'fr' ? SKILLS_PREAMBLE_FR : SKILLS_PREAMBLE_EN + return `${base}\n\n${preamble}${renderCatalog(entries)}` } diff --git a/packages/core/tests/skill-matcher.test.ts b/packages/core/tests/skill-matcher.test.ts new file mode 100644 index 0000000..1950385 --- /dev/null +++ b/packages/core/tests/skill-matcher.test.ts @@ -0,0 +1,43 @@ +import { describe, expect, test } from 'bun:test' +import { matchSkillForMessage } from '../src/builder/skill-matcher.ts' +import type { SkillEntry } from '../src/builder/skill-catalog.ts' + +const fakeSkill = ( + name: string, + triggers: string[], +): SkillEntry => ({ + name, + description: 'desc', + triggers, + actions: [], + body: 'body', + source: 'builtin', + filePath: '', +}) + +describe('matchSkillForMessage', () => { + test('matches a trigger as case-insensitive substring', () => { + const skill = fakeSkill('scaffold-and-run', ['audite', 'teste']) + const r = matchSkillForMessage('Audite ce projet TypeScript stp', [skill]) + expect(r?.name).toBe('scaffold-and-run') + }) + + test('returns null when no trigger matches', () => { + const skill = fakeSkill('scaffold-and-run', ['audite']) + const r = matchSkillForMessage('crée un agent qui écrit des haïkus', [skill]) + expect(r).toBeNull() + }) + + test('first skill in the list wins on multi-match', () => { + const a = fakeSkill('a-skill', ['shared']) + const b = fakeSkill('b-skill', ['shared']) + const r = matchSkillForMessage('shared keyword present', [a, b]) + expect(r?.name).toBe('a-skill') + }) + + test('empty trigger is ignored', () => { + const skill = fakeSkill('x', ['', ' ']) + const r = matchSkillForMessage('anything goes here', [skill]) + expect(r).toBeNull() + }) +}) diff --git a/packages/core/tests/system-prompt.test.ts b/packages/core/tests/system-prompt.test.ts index db6bbfd..6268597 100644 --- a/packages/core/tests/system-prompt.test.ts +++ b/packages/core/tests/system-prompt.test.ts @@ -1,6 +1,6 @@ -// System prompt — verify that the skill catalog is injected when -// provided, and that the prompt stays untouched when the catalog is -// empty. +// System prompt — verify that the skill catalog metadata is appended +// when entries are provided (skills are auto-dispatched by the CLI ; +// the LLM only sees them as an informational note). import { describe, expect, test } from 'bun:test' import { getBuilderSystemPrompt } from '../src/builder/system-prompt.ts' @@ -9,10 +9,10 @@ describe('getBuilderSystemPrompt', () => { test('returns the base prompt when no skills are provided', () => { const en = getBuilderSystemPrompt('en') expect(en).toContain('Agent Forge builder') - expect(en).not.toContain('STEP 0 — SKILL CHECK') + expect(en).not.toContain('Skills available') }) - test('prepends a SKILL CHECK preamble when entries are passed', () => { + test('appends an informational skill list when entries are passed', () => { const en = getBuilderSystemPrompt('en', { skills: [ { @@ -22,22 +22,22 @@ describe('getBuilderSystemPrompt', () => { }, ], }) - expect(en).toContain('STEP 0 — SKILL CHECK') + expect(en).toContain('Skills available') + expect(en).toContain('auto-dispatched') expect(en).toContain('scaffold-and-run') expect(en).toContain('Create then run.') expect(en).toContain('"audite", "test"') - // Preamble must come BEFORE the base prompt so the model reads the - // skill rule before the "be decisive, write immediately" rule. - expect(en.indexOf('STEP 0 — SKILL CHECK')).toBeLessThan( - en.indexOf('Agent Forge builder'), + // The base prompt comes first ; the skill note is a tail. + expect(en.indexOf('Agent Forge builder')).toBeLessThan( + en.indexOf('Skills available'), ) }) - test('FR variant uses French headers', () => { + test('FR variant uses French wording', () => { const fr = getBuilderSystemPrompt('fr', { skills: [{ name: 'x', description: 'y', triggers: [] }], }) - expect(fr).toContain('ÉTAPE 0 — VÉRIFICATION DE SKILL') - expect(fr).not.toContain('STEP 0 — SKILL CHECK') + expect(fr).toContain('Skills disponibles') + expect(fr).toContain('automatiquement par la CLI') }) }) From 744a17c052367b8dee0d8e0590268b4d9bef278c Mon Sep 17 00:00:00 2001 From: Georges Garnier Date: Mon, 27 Apr 2026 15:54:23 +0200 Subject: [PATCH 09/11] feat(cli): syntax highlighting for skill / write / run detail views MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The detail screens (Esc-Tab-Enter on a Mission Control card) used to render every action body through highlightPlain — everything came back as undifferentiated grey, which made long AGENT.md / agent run outputs hard to scan. Replaced by per-shape highlighting : - skill detail : Markdown highlighter (headings, lists, inline code, bold, fenced blocks). The skill body is markdown, so this matches. - write detail : if the file has YAML frontmatter (which AGENT.md always does), split frontmatter and body. The frontmatter goes through the YAML highlighter (already existing), the body through the new Markdown one. Falls back to plain YAML for files without frontmatter. - run detail (and the compact run card in Mission Control) : a new highlightAgentRun() that walks the streamed output and recognises : · fenced ```forge:* blocks (open line orange, body via the matching language highlighter — JSON for forge:bash/write/etc.) · [forge:tool] / [/forge:tool] markers wrapping the result of the previous tool call (rendered dim grey so it visually recedes) · regular prose with inline code spans and bold New helpers in syntax.ts : highlightMarkdown(text) highlightAgentRun(text) highlightYamlLine / highlightJsonLine kept exported for the run highlighter to delegate to. The compact card in MissionControl now also uses highlightAgentRun so the streaming output during a long run reads the same as the detail view, just clipped at maxLines. --- packages/cli/src/components/CardDetail.tsx | 53 +++- .../cli/src/components/MissionControl.tsx | 4 +- packages/cli/src/components/syntax.ts | 289 ++++++++++++++++-- 3 files changed, 310 insertions(+), 36 deletions(-) diff --git a/packages/cli/src/components/CardDetail.tsx b/packages/cli/src/components/CardDetail.tsx index 2f0fa56..c149c0d 100644 --- a/packages/cli/src/components/CardDetail.tsx +++ b/packages/cli/src/components/CardDetail.tsx @@ -13,7 +13,8 @@ import { C } from '../theme/colors.ts' import { type HighlightedLine, type Segment, - highlightPlain, + highlightAgentRun, + highlightMarkdown, highlightYamlText, } from './syntax.ts' @@ -35,40 +36,64 @@ const STATUS_COLOR: Record = { declined: C.grey, } +function sectionHeader(label: string): HighlightedLine { + return [{ text: `── ${label} ──`, color: C.grey, dim: true }] +} + function buildLines(action: Action): HighlightedLine[] { if (action.kind === 'write') { + // AGENT.md = YAML frontmatter + Markdown body. Splitting them and + // highlighting each with its own grammar gives much better + // contrast than a single YAML pass over the whole file. + const frontmatterMatch = action.content.match( + /^---\s*\n([\s\S]*?)\n---\s*\n?([\s\S]*)$/, + ) + if (frontmatterMatch) { + const fmRaw = frontmatterMatch[1] ?? '' + const bodyRaw = frontmatterMatch[2] ?? '' + const out: HighlightedLine[] = [] + out.push([{ text: '---', color: C.grey, dim: true }]) + out.push(...highlightYamlText(fmRaw)) + out.push([{ text: '---', color: C.grey, dim: true }]) + if (bodyRaw.length > 0) { + out.push([{ text: ' ' }]) + out.push(...highlightMarkdown(bodyRaw)) + } + return out + } return highlightYamlText(action.content) } if (action.kind === 'skill') { const out: HighlightedLine[] = [] - out.push([{ text: '── description ──', color: C.grey, dim: true }]) - out.push(...highlightPlain(action.description)) - out.push([{ text: '' }]) - out.push([{ text: '── instructions injected into context ──', color: C.grey, dim: true }]) + out.push(sectionHeader('description')) + out.push(...highlightMarkdown(action.description)) + out.push([{ text: ' ' }]) + out.push(sectionHeader('instructions injected into context')) if (action.body && action.body.length > 0) { - out.push(...highlightPlain(action.body)) + out.push(...highlightMarkdown(action.body)) } else { out.push([{ text: '(skill body not loaded yet)', color: C.grey, dim: true }]) } if (action.status === 'failed' && action.error) { - out.push([{ text: '' }]) + out.push([{ text: ' ' }]) out.push([{ text: `✗ ${action.error}`, color: C.red }]) } return out } - // run : prompt then output + // run : prompt (markdown-ish prose) then output (mixed forge:* + + // [forge:tool] streams). const out: HighlightedLine[] = [] - out.push([{ text: '── prompt ──', color: C.grey, dim: true }]) - out.push(...highlightPlain(action.prompt)) - out.push([{ text: '' }]) - out.push([{ text: '── output ──', color: C.grey, dim: true }]) + out.push(sectionHeader('prompt')) + out.push(...highlightMarkdown(action.prompt)) + out.push([{ text: ' ' }]) + out.push(sectionHeader('output')) if (action.output.length > 0) { - out.push(...highlightPlain(action.output)) + out.push(...highlightAgentRun(action.output)) } else { out.push([{ text: '(empty)', color: C.grey, dim: true }]) } if (action.status === 'failed' && action.error) { - out.push([{ text: '' }]) + out.push([{ text: ' ' }]) out.push([{ text: `✗ ${action.error}`, color: C.red }]) } return out diff --git a/packages/cli/src/components/MissionControl.tsx b/packages/cli/src/components/MissionControl.tsx index d266aa2..2a61570 100644 --- a/packages/cli/src/components/MissionControl.tsx +++ b/packages/cli/src/components/MissionControl.tsx @@ -21,6 +21,7 @@ import { C } from '../theme/colors.ts' import { type HighlightedLine, type Segment, + highlightAgentRun, highlightPlain, highlightYamlText, } from './syntax.ts' @@ -172,7 +173,8 @@ function RunCard({ focused: boolean }): React.JSX.Element { const promptLines = highlightPlain(action.prompt) - const outputLines = action.output.length > 0 ? highlightPlain(action.output) : [] + const outputLines = + action.output.length > 0 ? highlightAgentRun(action.output) : [] return ( diff --git a/packages/cli/src/components/syntax.ts b/packages/cli/src/components/syntax.ts index d846f3d..bd3a911 100644 --- a/packages/cli/src/components/syntax.ts +++ b/packages/cli/src/components/syntax.ts @@ -1,33 +1,38 @@ -// Tiny, line-oriented syntax helpers for the MissionControl preview. -// Returns segments {text, color, dim?} that components can render with Ink. -// We deliberately avoid a real parser : agents emit small YAML / plain text -// blocks, a handful of regexes is enough. +// Tiny, line-oriented syntax helpers for Mission Control and the +// CardDetail view. Goals : +// - keep it dependency-free (regex only) ; +// - cover the four shapes Agent Forge actually shows : YAML, plain +// text, Markdown, and JSON-ish ; +// - recognise fenced blocks inside Markdown so a forge:bash inside +// an agent run reads as bash, not as prose. +// +// Each highlighter returns a list of HighlightedLine ; a +// HighlightedLine is a list of Segment ({text, color, dim?, bold?}) +// that components render with Ink. import { C } from '../theme/colors.ts' -export type Segment = { text: string; color?: string; dim?: boolean; bold?: boolean } +export type Segment = { + text: string + color?: string + dim?: boolean + bold?: boolean +} export type HighlightedLine = Segment[] +// ── YAML ───────────────────────────────────────────────────────── + const YAML_KEY_RE = /^(\s*)([A-Za-z_][\w-]*)(\s*:)(\s*)(.*)$/ const YAML_LIST_RE = /^(\s*)(-)(\s+)(.*)$/ const YAML_SEPARATOR_RE = /^---\s*$/ const YAML_COMMENT_RE = /^(\s*)(#.*)$/ function valueSegment(value: string): Segment { - // Numbers - if (/^-?\d+(\.\d+)?$/.test(value)) { - return { text: value, color: C.greyLight } - } - // Quoted string - if (/^["'].*["']$/.test(value)) { - return { text: value, color: C.greyLight } - } - // Booleans / null - if (/^(true|false|null|yes|no)$/i.test(value)) { + if (/^-?\d+(\.\d+)?$/.test(value)) return { text: value, color: C.greyLight } + if (/^["'].*["']$/.test(value)) return { text: value, color: C.greyLight } + if (/^(true|false|null|yes|no)$/i.test(value)) return { text: value, color: C.orangeBright } - } - // Bare value return { text: value, color: C.white } } @@ -61,12 +66,9 @@ export function highlightYamlLine(line: string): HighlightedLine { { text: colon ?? '', color: C.grey }, { text: space ?? '' }, ] - if (value && value.length > 0) { - segs.push(valueSegment(value)) - } + if (value && value.length > 0) segs.push(valueSegment(value)) return segs } - // Markdown header inside body if (/^#\s/.test(line)) { return [{ text: line, color: C.orangeBright, bold: true }] } @@ -77,8 +79,253 @@ export function highlightYamlText(text: string): HighlightedLine[] { return text.split('\n').map(highlightYamlLine) } +// ── Plain ──────────────────────────────────────────────────────── + export function highlightPlain(text: string): HighlightedLine[] { return text .split('\n') .map((l) => [{ text: l.length > 0 ? l : ' ', color: C.greyLight }]) } + +// ── JSON ───────────────────────────────────────────────────────── +// +// Tokeniser-light : single line at a time. We don't try to follow +// multi-line strings — agents rarely emit them. The aim is colour, +// not validation. + +const JSON_TOKEN_RE = /"(?:[^"\\]|\\.)*"|true|false|null|-?\d+(?:\.\d+)?/g + +function highlightJsonLine(line: string): HighlightedLine { + if (line.length === 0) return [{ text: ' ' }] + const segs: HighlightedLine = [] + let last = 0 + for (const m of line.matchAll(JSON_TOKEN_RE)) { + const idx = m.index ?? 0 + if (idx > last) segs.push({ text: line.slice(last, idx), color: C.grey }) + const tok = m[0] + if (tok.startsWith('"')) { + // Heuristic : a quoted string immediately followed by ':' is a key, + // colour as orange ; otherwise a value (greyLight). + const after = line.slice(idx + tok.length).trimStart() + if (after.startsWith(':')) { + segs.push({ text: tok, color: C.orange, bold: true }) + } else { + segs.push({ text: tok, color: C.greyLight }) + } + } else if (tok === 'true' || tok === 'false' || tok === 'null') { + segs.push({ text: tok, color: C.orangeBright }) + } else { + segs.push({ text: tok, color: C.white }) + } + last = idx + tok.length + } + if (last < line.length) segs.push({ text: line.slice(last), color: C.grey }) + return segs +} + +// ── Markdown (with fenced blocks) ──────────────────────────────── +// +// Recognises : +// - ATX headings (#, ##, ...) +// - Unordered list bullets (-, *, +) +// - Ordered list bullets (1. 2. ...) +// - Inline code spans (`...`) +// - Bold (**...**) and emphasis (*...*) — colour only, no font +// - Fenced code blocks ```lang ... ``` : the content is forwarded +// to the matching highlighter (yaml/json/plain), and the fences +// themselves render dim grey +// +// Special-case our own fence prefix `forge:*` : the body is JSON-ish, +// route it to the JSON highlighter. + +const HEADING_RE = /^(#{1,6})\s+(.*)$/ +const ULIST_RE = /^(\s*)([-*+])(\s+)(.*)$/ +const OLIST_RE = /^(\s*)(\d+\.)(\s+)(.*)$/ +const FENCE_OPEN_RE = /^```(\S*)\s*$/ +const FENCE_CLOSE_RE = /^```\s*$/ +const INLINE_CODE_RE = /`[^`]+`/g +const BOLD_RE = /\*\*[^*]+\*\*/g + +function languageHighlighter(lang: string): (line: string) => HighlightedLine { + const l = lang.toLowerCase() + if (l === 'yaml' || l === 'yml') return highlightYamlLine + if (l === 'json' || l.startsWith('forge:')) return highlightJsonLine + if (l === 'bash' || l === 'sh' || l === 'shell') { + return (line) => [{ text: line.length > 0 ? line : ' ', color: C.greyLight }] + } + // Default for unknown / TypeScript / etc. : neutral grey-light. + return (line) => [{ text: line.length > 0 ? line : ' ', color: C.greyLight }] +} + +// Apply inline code spans and bold to a Markdown prose line. Returns +// a list of segments. Order doesn't matter because the matched +// regions don't overlap in practice (we don't try to nest them). +function highlightInlineMarkdown(line: string): HighlightedLine { + type Mark = { start: number; end: number; seg: Segment } + const marks: Mark[] = [] + for (const m of line.matchAll(INLINE_CODE_RE)) { + if (m.index === undefined) continue + marks.push({ + start: m.index, + end: m.index + m[0].length, + seg: { text: m[0], color: C.orangeBright }, + }) + } + for (const m of line.matchAll(BOLD_RE)) { + if (m.index === undefined) continue + // Skip if overlaps an existing inline-code mark. + const overlap = marks.some( + (e) => + !(e.end <= (m.index ?? 0) || e.start >= (m.index ?? 0) + m[0].length), + ) + if (overlap) continue + marks.push({ + start: m.index, + end: m.index + m[0].length, + seg: { text: m[0], color: C.white, bold: true }, + }) + } + if (marks.length === 0) return [{ text: line, color: C.greyLight }] + marks.sort((a, b) => a.start - b.start) + const segs: HighlightedLine = [] + let cur = 0 + for (const mark of marks) { + if (mark.start > cur) { + segs.push({ text: line.slice(cur, mark.start), color: C.greyLight }) + } + segs.push(mark.seg) + cur = mark.end + } + if (cur < line.length) segs.push({ text: line.slice(cur), color: C.greyLight }) + return segs +} + +export function highlightMarkdown(text: string): HighlightedLine[] { + const out: HighlightedLine[] = [] + const lines = text.split('\n') + let inFence = false + let fenceLang = '' + let fenceLine: ((line: string) => HighlightedLine) | null = null + for (const raw of lines) { + if (inFence) { + if (FENCE_CLOSE_RE.test(raw)) { + out.push([{ text: raw, color: C.grey, dim: true }]) + inFence = false + fenceLang = '' + fenceLine = null + continue + } + out.push((fenceLine ?? highlightYamlLine)(raw)) + continue + } + const fenceOpen = raw.match(FENCE_OPEN_RE) + if (fenceOpen) { + inFence = true + fenceLang = fenceOpen[1] ?? '' + fenceLine = languageHighlighter(fenceLang) + out.push([{ text: raw, color: C.grey, dim: true }]) + continue + } + if (raw.length === 0) { + out.push([{ text: ' ' }]) + continue + } + const heading = raw.match(HEADING_RE) + if (heading) { + out.push([ + { text: heading[1] ?? '', color: C.orange, bold: true }, + { text: ' ' }, + { text: heading[2] ?? '', color: C.orangeBright, bold: true }, + ]) + continue + } + const ulist = raw.match(ULIST_RE) + if (ulist) { + out.push([ + { text: ulist[1] ?? '' }, + { text: ulist[2] ?? '', color: C.orange, bold: true }, + { text: ulist[3] ?? '' }, + ...highlightInlineMarkdown(ulist[4] ?? ''), + ]) + continue + } + const olist = raw.match(OLIST_RE) + if (olist) { + out.push([ + { text: olist[1] ?? '' }, + { text: olist[2] ?? '', color: C.orange, bold: true }, + { text: olist[3] ?? '' }, + ...highlightInlineMarkdown(olist[4] ?? ''), + ]) + continue + } + out.push(highlightInlineMarkdown(raw)) + } + return out +} + +// ── Mixed run output ───────────────────────────────────────────── +// +// What an agent produces during a multi-turn run is a mix of : +// - prose +// - fenced ```forge:bash / forge:write / forge:read / ... blocks +// - injected [forge:tool] / [/forge:tool] markers framing the +// result of the previous tool call (raw stdout, often shell-y) +// +// We treat the markers like another fence type : everything between +// [forge:tool] and [/forge:tool] is rendered with a dim, distinct +// colour so the user can tell tool output from the agent's narration. + +const TOOL_OPEN_RE = /^\[forge:tool\]\s*$/ +const TOOL_CLOSE_RE = /^\[\/forge:tool\]\s*$/ + +export function highlightAgentRun(text: string): HighlightedLine[] { + const out: HighlightedLine[] = [] + const lines = text.split('\n') + let inFence = false + let fenceLine: ((line: string) => HighlightedLine) | null = null + let inTool = false + + for (const raw of lines) { + if (inFence) { + if (FENCE_CLOSE_RE.test(raw)) { + out.push([{ text: raw, color: C.grey, dim: true }]) + inFence = false + fenceLine = null + continue + } + out.push((fenceLine ?? highlightYamlLine)(raw)) + continue + } + if (inTool) { + if (TOOL_CLOSE_RE.test(raw)) { + out.push([{ text: raw, color: C.grey, dim: true }]) + inTool = false + continue + } + // Tool output is opaque shell-ish content. Render as plain + // greyLight so it stays readable but visually quieter than + // the agent's prose / blocks. + out.push([{ text: raw.length > 0 ? raw : ' ', color: C.grey }]) + continue + } + if (TOOL_OPEN_RE.test(raw)) { + out.push([{ text: raw, color: C.grey, dim: true }]) + inTool = true + continue + } + const fenceOpen = raw.match(FENCE_OPEN_RE) + if (fenceOpen) { + inFence = true + fenceLine = languageHighlighter(fenceOpen[1] ?? '') + out.push([{ text: raw, color: C.orange, bold: true }]) + continue + } + if (raw.length === 0) { + out.push([{ text: ' ' }]) + continue + } + out.push(highlightInlineMarkdown(raw)) + } + return out +} From 08da707df779ee34999ddaa86d56941f0abd7839 Mon Sep 17 00:00:00 2001 From: Georges Garnier Date: Mon, 27 Apr 2026 16:06:51 +0200 Subject: [PATCH 10/11] feat(cli): Mission Control compact mode + scrollable viewport MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When several agents run in a session, the panel was stacking 6+ fully-expanded cards and overflowing the terminal. Two changes : 1. Compact mode by default. Each non-focused card now renders as a single line : badge + verb + target. Borders disappear, the terminal stays calm. The focused card expands to its full preview like before. Cards in 'running' status stay expanded too, so a streaming agent run remains visible without having to Tab to it. 2. Bounded viewport. Mission Control now takes a panelHeight prop (computed by App from the terminal rows minus a Welcome floor and a spacer) and slices the action list to fit. Truncated actions show as "↑ N above / ↓ N below" hints in the panel header. Welcome stays glued to the bottom with flexShrink=0, so the panel is what gives way on small terminals. useCardFocus extended : - scrollTop : action-index offset, advanced via PgUp/PgDn ; - auto-focus the last new arrival when nothing is focused (the user immediately sees what the builder just produced) ; - auto-scroll lower bound : focusing an action above scrollTop bumps scrollTop down to keep it visible. App now routes PgUp/PgDn to the Mission Control scroll when there are actions and the prompt is empty (or a card is focused). It keeps falling back to the chat transcript scroll otherwise. Tab, Shift+Tab, Enter, Esc unchanged. --- packages/cli/src/components/App.tsx | 61 +++++-- .../cli/src/components/MissionControl.tsx | 156 ++++++++++++++++-- packages/cli/src/hooks/useCardFocus.ts | 123 +++++++++++--- 3 files changed, 286 insertions(+), 54 deletions(-) diff --git a/packages/cli/src/components/App.tsx b/packages/cli/src/components/App.tsx index c85a8c9..01c4ef8 100644 --- a/packages/cli/src/components/App.tsx +++ b/packages/cli/src/components/App.tsx @@ -8,10 +8,14 @@ // │ Welcome │ header + transcript + (confirm dialog OR prompt) + footer // └──────────────┘ ← terminal bottom (FIXED) // -// PgUp / PgDn / Ctrl+E scroll the chat transcript inside Welcome. -// Tab / Shift+Tab cycle focus through Mission Control cards (only when -// the prompt input is empty so it doesn't fight TextInput). Enter on a -// focused card opens a full-screen CardDetail view ; Esc closes it. +// Scroll responsibilities : +// - Welcome's chat transcript : PgUp/PgDn/Ctrl+E when no card is focused +// AND no Mission Control scroll is needed. +// - Mission Control panel : PgUp/PgDn when focus is inside the panel +// (or, more simply, when there are more actions than fit and the +// prompt is empty). +// - Tab/Shift+Tab cycle the focused card. Enter opens the detail +// view. Esc unfocuses. The detail view is a full-screen modal. import { Box, useInput, useStdin } from 'ink' import React from 'react' @@ -24,6 +28,12 @@ import { ProviderLogo } from './ProviderLogo.tsx' import { Splash } from './Splash.tsx' import { Welcome } from './Welcome.tsx' +// Keep Welcome's bottom block (header + transcript + prompt + footer) +// at this minimum height ; everything above goes to Mission Control. +const WELCOME_MIN_HEIGHT = 12 +// Reserve a few rows above Welcome for the spacer + provider logo. +const SPACER_HEIGHT = 4 + export function App(): React.JSX.Element { const { lang } = useLanguage() const { isRawModeSupported } = useStdin() @@ -36,9 +46,14 @@ export function App(): React.JSX.Element { const hasActions = state.actions.length > 0 const promptIsEmpty = promptDraft.length === 0 - // Tab/Enter is only meaningful when there are actions, the prompt is - // empty (so TextInput doesn't lose its keystrokes), and no permission - // dialog is showing. + // Mission Control gets whatever is left after Welcome and the + // spacer/logo claim their slots. Floor at 6 so the panel never + // collapses below "header + 1 card line + truncation hints". + const panelHeight = Math.max( + 6, + rows - WELCOME_MIN_HEIGHT - SPACER_HEIGHT, + ) + const cardKeysActive = isRawModeSupported && lang !== null && @@ -49,15 +64,26 @@ export function App(): React.JSX.Element { useInput( (input, key) => { - if (key.pageUp) scrollUp() - else if (key.pageDown) scrollDown() - else if (key.ctrl && input === 'e') scrollToBottom() - else if (cardKeysActive && key.tab && key.shift) focus.cycleBack() + // PgUp/PgDn : when a card is focused OR there's nothing in the + // prompt and we have actions, scroll Mission Control. Otherwise + // scroll the chat transcript (legacy behaviour). + if (key.pageUp) { + if (cardKeysActive || focus.focusedId !== null) focus.scrollUp() + else scrollUp() + return + } + if (key.pageDown) { + if (cardKeysActive || focus.focusedId !== null) focus.scrollDown() + else scrollDown() + return + } + if (key.ctrl && input === 'e') { + scrollToBottom() + return + } + if (cardKeysActive && key.tab && key.shift) focus.cycleBack() else if (cardKeysActive && key.tab) focus.cycle() else if (cardKeysActive && key.return) focus.open() - // Esc clears the card focus (only when something is focused and - // the prompt is empty, so we never swallow an Esc the user meant - // for cancelling input). else if ( key.escape && promptIsEmpty && @@ -82,7 +108,12 @@ export function App(): React.JSX.Element { {hasActions ? ( - + ) : ( )} diff --git a/packages/cli/src/components/MissionControl.tsx b/packages/cli/src/components/MissionControl.tsx index 2a61570..e991dd3 100644 --- a/packages/cli/src/components/MissionControl.tsx +++ b/packages/cli/src/components/MissionControl.tsx @@ -1,12 +1,16 @@ // MissionControl — fills the top zone whenever there is at least one -// builder action (write or run). Replaces the splash screen for the rest -// of the session. +// builder action. Two display modes per card : // -// Each action gets a card with : -// - a status badge (proposed / running / done / failed) -// - the target (file path or agent name) -// - a syntax-highlighted preview of the content (YAML for AGENT.md, -// plain for prompts) or the streaming agent output +// - compact (default for unfocused cards) : 1 terminal line, badge + +// verb + target, kept together with a thin border. +// - expanded (focused card, or any card whose status is 'running' so +// a streaming output stays visible) : the full preview panel as +// before. +// +// The panel itself is bounded : it accepts a panelHeight prop and +// renders only the slice of cards starting at scrollTop that fits +// within that height. Truncation is signalled by "↑ N above / +// ↓ N below" hints in the header. import { Box, Text } from 'ink' import React from 'react' @@ -132,6 +136,39 @@ function FocusMarker({ focused }: { focused: boolean }): React.JSX.Element { ) } +// ── Compact row : single line for unfocused cards ───────────────── + +function verbFor(action: Action): string { + if (action.kind === 'write') return 'write' + if (action.kind === 'run') return 'run' + return 'skill' +} + +function targetFor(action: Action): string { + if (action.kind === 'write') return action.path + if (action.kind === 'run') return action.agent + return action.skill +} + +function CompactRow({ + action, + focused, +}: { + action: Action + focused: boolean +}): React.JSX.Element { + return ( + + + + {` ${verbFor(action).padEnd(5, ' ')} `} + {targetFor(action)} + + ) +} + +// ── Expanded cards ──────────────────────────────────────────────── + function WriteCard({ action, focused, @@ -236,22 +273,96 @@ function SkillCard({ ) } +// ── Layout : how many lines does a card need ? ──────────────────── + +const COMPACT_HEIGHT = 1 + +function expandedHeight(action: Action): number { + // Empirical estimate ; we don't try to be exact, we want a stable + // upper bound so the panel can budget rows. + if (action.kind === 'write') { + // CardFrame border 2, header 1, marginTop 1, body up to 14, hint 1+1 = ~20 + return 20 + } + if (action.kind === 'run') { + // border 2, header 1, prompt label 1, prompt up to 6, output label 1, output up to 14, error 1 = ~26 + return 26 + } + // skill : border 2, header 1, description ~1, loaded hint 1 = ~7 + return 7 +} + +function heightOf( + action: Action, + focused: boolean, +): number { + if (focused) return expandedHeight(action) + // Running cards stay expanded so a streaming agent run stays visible. + if (action.status === 'running') return expandedHeight(action) + return COMPACT_HEIGHT + 1 /* paddingY around row */ +} + +// ── Slicing : start at scrollTop, fit within panelHeight ────────── + +type Slice = { + visible: Action[] + hiddenAbove: number + hiddenBelow: number +} + +function sliceForViewport({ + actions, + focusedId, + scrollTop, + panelHeight, +}: { + actions: Action[] + focusedId: string | null + scrollTop: number + panelHeight: number +}): Slice { + const start = Math.min(Math.max(0, scrollTop), Math.max(0, actions.length - 1)) + const visible: Action[] = [] + let used = 0 + for (let i = start; i < actions.length; i += 1) { + const a = actions[i] as Action + const h = heightOf(a, a.id === focusedId) + if (used + h > panelHeight && visible.length > 0) break + visible.push(a) + used += h + if (used >= panelHeight) break + } + return { + visible, + hiddenAbove: start, + hiddenBelow: Math.max(0, actions.length - start - visible.length), + } +} + export function MissionControl({ actions, focusedId, + scrollTop, + panelHeight, }: { actions: Action[] focusedId: string | null + scrollTop: number + panelHeight: number }): React.JSX.Element { const cols = process.stdout.columns ?? 80 + // Reserve 2 rows for the header + truncation hints, the rest is body. + const bodyHeight = Math.max(3, panelHeight - 2) + const slice = sliceForViewport({ + actions, + focusedId, + scrollTop, + panelHeight: bodyHeight, + }) + return ( - - + + {' ▌▌ MISSION CONTROL ▐▐ '} @@ -268,12 +379,27 @@ export function MissionControl({ )} - {actions.map((a) => { + + {slice.hiddenAbove > 0 ? ( + + {` ↑ ${slice.hiddenAbove.toString()} action${slice.hiddenAbove === 1 ? '' : 's'} above`} + + ) : null} + + {slice.visible.map((a) => { const focused = a.id === focusedId + const expand = focused || a.status === 'running' + if (!expand) return if (a.kind === 'write') return if (a.kind === 'run') return return })} + + {slice.hiddenBelow > 0 ? ( + + {` ↓ ${slice.hiddenBelow.toString()} action${slice.hiddenBelow === 1 ? '' : 's'} below`} + + ) : null} ) } diff --git a/packages/cli/src/hooks/useCardFocus.ts b/packages/cli/src/hooks/useCardFocus.ts index a40bd4a..152c6e1 100644 --- a/packages/cli/src/hooks/useCardFocus.ts +++ b/packages/cli/src/hooks/useCardFocus.ts @@ -1,47 +1,93 @@ -// Mission Control card focus + detail view state. +// Mission Control card focus + scroll + detail view state. // -// Kept separate from useChat so the chat hook stays focused on -// conversation/action state. Exposes : -// - focusedId : id of the action currently highlighted (or null) -// - detailOpen : whether the full-screen detail panel is mounted -// - cycle / cycleBack / open / close : the actions wired to Tab keys +// Focus : +// - Tab from "no focus" → focus the LAST action (most recent). +// - Tab again → walk forward (wraps). +// - Shift+Tab → walk backward (wraps). +// - Esc clears focus (keep card content visible, just unhighlight). +// - When the focused action disappears, drop focus. // -// Behaviour : -// - Tab from "no focus" → focus the LAST action (most recent on top -// of Mission Control reads as bottom of the list, so we land on -// what the user just saw). -// - Tab again → walk forward; wraps around. -// - Shift+Tab → walk backward; wraps around. -// - When the focused action disappears (cleared, etc.), focus resets. - -import { useCallback, useEffect, useState } from 'react' +// Auto-focus : +// - When a new action arrives and nothing is focused, auto-focus +// the new one so the user immediately sees what the builder did. +// - We track the last seen action ids in a ref to detect "new". +// +// Scroll : +// - scrollTop is an action-INDEX offset. The Mission Control panel +// slices `actions.slice(scrollTop, …)` to fit panelHeight. +// - cycle / cycleBack adjust scrollTop when the focused index moves +// out of the visible window. The visible window size depends on +// the panel layout, which we don't know here ; we use a +// conservative heuristic : keep the focused index >= scrollTop. +// - scrollUp / scrollDown / scrollHome / scrollEnd let App expose +// PgUp / PgDn / Home / End to the user when no card is focused. + +import { useCallback, useEffect, useRef, useState } from 'react' import type { Action } from '../actions/types.ts' export type CardFocusApi = { focusedId: string | null detailOpen: boolean + scrollTop: number cycle: () => void cycleBack: () => void open: () => void close: () => void clearFocus: () => void + scrollUp: () => void + scrollDown: () => void + scrollHome: () => void + scrollEnd: () => void } export function useCardFocus(actions: Action[]): CardFocusApi { const [focusedId, setFocusedId] = useState(null) const [detailOpen, setDetailOpen] = useState(false) + const [scrollTop, setScrollTop] = useState(0) + + // Remember the previous action ids so we can detect new arrivals + // without firing on every render (initial mount included). + const prevIdsRef = useRef>(new Set()) - // If the focused action disappears (e.g. /clear), drop focus and the - // detail panel together so we never display a stale card. + // Auto-focus the most recent action when one shows up and nothing + // is focused yet. Also trims focus / scroll when actions vanish. useEffect(() => { - if (focusedId === null) return - const stillThere = actions.some((a) => a.id === focusedId) - if (!stillThere) { + const currentIds = new Set(actions.map((a) => a.id)) + // Find ids that weren't there last render — new arrivals. + const newIds: string[] = [] + for (const a of actions) { + if (!prevIdsRef.current.has(a.id)) newIds.push(a.id) + } + prevIdsRef.current = currentIds + + if (focusedId !== null && !currentIds.has(focusedId)) { setFocusedId(null) setDetailOpen(false) } + + // Auto-focus the latest new arrival, but only if nothing is + // currently focused (don't steal focus mid-cycle). + if (newIds.length > 0 && focusedId === null) { + const last = newIds[newIds.length - 1] + if (last !== undefined) setFocusedId(last) + } + + // Keep scrollTop within bounds. + setScrollTop((st) => Math.max(0, Math.min(st, Math.max(0, actions.length - 1)))) }, [actions, focusedId]) + // Scroll-to-focus : whenever focusedId changes, make sure scrollTop + // is at most the focused index (so the focused card is at or below + // the panel's first visible slot). The panel itself caps scrollTop + // upward when the focused card would fall below the bottom edge — + // we don't know panelHeight here, so we keep a lower bound only. + useEffect(() => { + if (focusedId === null) return + const idx = actions.findIndex((a) => a.id === focusedId) + if (idx === -1) return + setScrollTop((st) => (idx < st ? idx : st)) + }, [focusedId, actions]) + const cycle = useCallback(() => { if (actions.length === 0) return setFocusedId((current) => { @@ -58,9 +104,7 @@ export function useCardFocus(actions: Action[]): CardFocusApi { const cycleBack = useCallback(() => { if (actions.length === 0) return setFocusedId((current) => { - if (current === null) { - return actions[0]?.id ?? null - } + if (current === null) return actions[0]?.id ?? null const idx = actions.findIndex((a) => a.id === current) if (idx === -1) return actions[0]?.id ?? null const prev = (idx - 1 + actions.length) % actions.length @@ -81,5 +125,36 @@ export function useCardFocus(actions: Action[]): CardFocusApi { setDetailOpen(false) }, []) - return { focusedId, detailOpen, cycle, cycleBack, open, close, clearFocus } + const scrollUp = useCallback(() => { + setScrollTop((st) => Math.max(0, st - 1)) + }, []) + + const scrollDown = useCallback(() => { + setScrollTop((st) => + Math.min(Math.max(0, actions.length - 1), st + 1), + ) + }, [actions.length]) + + const scrollHome = useCallback(() => { + setScrollTop(0) + }, []) + + const scrollEnd = useCallback(() => { + setScrollTop(Math.max(0, actions.length - 1)) + }, [actions.length]) + + return { + focusedId, + detailOpen, + scrollTop, + cycle, + cycleBack, + open, + close, + clearFocus, + scrollUp, + scrollDown, + scrollHome, + scrollEnd, + } } From 68225dcd22abfe6f66aba18dd62f74799a002e2b Mon Sep 17 00:00:00 2001 From: Georges Garnier Date: Mon, 27 Apr 2026 17:46:59 +0200 Subject: [PATCH 11/11] docs: bump readmes to P4 + P6 (native tools + skill layer) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Status badge moves to P6 done. Roadmap table lifts P4 and P6 to ✅, points to P5 (hardened sandbox + persistent agents + artifact extraction) as the next milestone. Root README (EN/FR) gains : - Native tools section : six-tool table (Bash, FileWrite, FileRead, FileEdit, Grep, Glob) with their tags and limits ; a short note explaining the choice of a text-structured forge:* protocol over OpenAI tool_calls. - Skills section : SKILL.md format, two sources (built-in and ~/.agent-forge/skills/), server-side matcher, two-call runner, scaffold-and-run as the first built-in. - Mission Control keyboard cheatsheet : Tab / Enter / Esc / PgUp/PgDn / Ctrl+E. - /skills slash command listed. - Architecture diagram updated : skill catalog + runner on the host side, /workspace mount + tool loop on the container side, persistence of the workspace dir after exit. - Repo structure shows packages/core/src/builder/skills/, runtime/src/tool-protocol.ts, and the runtime/ subdir under tools-core. Sub-package READMEs realigned : - packages/cli : compact / expanded card mode, scrollable viewport, focus + auto-scroll, detail view, full keyboard map, dispatch skill server-side mention. - packages/core : skill catalog / matcher / runner files listed, scaffold-and-run noted as built-in. - packages/runtime : multi-turn tool loop documented, six forge:* tags, [forge:tool] markers on stdout, FORGE_MAX_TOKENS env var. - packages/tools-core : separate "host tools" and "runtime tools" sections ; six runtime tools with their constraints ; test layout listed. --- README.fr.md | 90 ++++++++++++++++++++++++++++------- README.md | 90 ++++++++++++++++++++++++++++------- packages/cli/README.md | 44 ++++++++++------- packages/core/README.md | 17 ++++--- packages/runtime/README.md | 23 ++++++--- packages/tools-core/README.md | 47 ++++++++++-------- 6 files changed, 226 insertions(+), 85 deletions(-) diff --git a/README.fr.md b/README.fr.md index 167913f..64c8456 100644 --- a/README.fr.md +++ b/README.fr.md @@ -7,7 +7,7 @@ **Forgez, lancez et orchestrez des agents LLM en sandbox.** [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](./LICENSE) - ![Status: P3 done](https://img.shields.io/badge/status-P3%20done-green) + ![Status: P6 done](https://img.shields.io/badge/status-P6%20done-green) ![Stack: TypeScript + Bun](https://img.shields.io/badge/stack-TypeScript_+_Bun-3178c6) 🇫🇷 Version française · [🇬🇧 English version](./README.md) @@ -16,7 +16,7 @@ --- -> 🚧 **Statut — POC, jalon P3 atteint.** Vous pouvez désormais lancer `bun run forge`, décrire un agent en français ou en anglais, regarder le builder rédiger l'`AGENT.md`, l'approuver, puis demander au builder d'exécuter cet agent — il monte son propre container Docker, streame la sortie, puis détruit la sandbox. Prochain jalon : P4 — tools natifs (Bash, FileRead, FileEdit, FileWrite, Grep, Glob). +> 🚧 **Statut — POC, jalons P1 → P6 atteints.** Vous pouvez désormais lancer `bun run forge`, décrire un agent en français ou en anglais, regarder le builder rédiger l'`AGENT.md`, l'approuver, puis demander au builder d'exécuter cet agent — il monte son propre container Docker avec **six tools natifs** (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) sandboxés sous `/workspace`, streame la sortie, puis détruit la sandbox. Les patterns d'orchestration récurrents sont gérés par des **skills** : déposez un `SKILL.md` dans `~/.agent-forge/skills/` (ou utilisez la skill built-in `scaffold-and-run`) et la CLI active automatiquement quand un trigger apparaît dans votre message. Prochain jalon : P5 — sandbox durci + extraction d'artefacts. ## Qu'est-ce qu'Agent Forge ? @@ -35,9 +35,9 @@ Le builder est la seule surface conversationnelle. Les sous-agents sont créés | **P1** | Hello agent dans Docker (script host ↔ container ↔ round-trip LLM) | ✅ fait | | **P2** | CLI conversationnelle (REPL Ink, EN/FR, slash commands, switch provider) | ✅ fait | | **P3** | Le builder écrit l'`AGENT.md`, demande la permission, lance l'agent dans un container neuf, streame la sortie | ✅ fait | -| P4 | Six tools natifs (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) utilisables depuis la sandbox | suivant | -| P5 | Sandbox durci + extraction d'artefacts vers le host | | -| P6 | Skills enrichis (scaffolding projet, audits, fixes) | | +| **P4** | Six tools natifs sandboxés sous `/workspace` : Bash, FileWrite, FileRead, FileEdit, Grep, Glob ; tool-loop runtime avec `maxTurns` | ✅ fait | +| **P6** | Couche skills : format `SKILL.md`, catalogue (built-in + `~/.agent-forge/skills/`), matching des triggers côté serveur, runner à 2 appels (un pour AGENT.md, un pour le run prompt) | ✅ fait | +| P5 | Sandbox durci + agents persistants (`docker exec`) + extraction d'artefacts vers le host | suivant | | P7 | `TEAM.md` — exécutions multi-agents coordonnées | | | P8 | Dashboard pixel art (activité agents en direct) | | | P9 | ★ POC validé : démo Next.js + Laravel + QA de bout en bout | | @@ -148,6 +148,36 @@ Vous pouvez aussi switcher à la volée depuis le REPL : `/provider mistral`, `/ Chaque session est persistée dans `~/.agent-forge/sessions//transcript.jsonl`. `/sessions` liste les sessions, `/session` affiche l'id courante. +## Tools natifs (dans la sandbox de l'agent) + +Les agents lancés par le builder tournent dans un container jetable avec `/workspace` monté en écriture. Six tools natifs sont exposés et appelés via des blocs encadrés `forge:*` que l'agent émet dans sa réponse : + +| Tag | Tool | Ce que ça fait | +|---|---|---| +| `forge:bash` | Bash | `bash -lc ` dans `/workspace`, timeout 30 s par défaut (max 120 s), sortie clippée à 16 Ko | +| `forge:write` | FileWrite | Crée ou écrase un fichier sous `/workspace`, dossiers parents auto-créés | +| `forge:read` | FileRead | Offset/limit en lignes, clip à 16 Ko, refuse les non-fichiers | +| `forge:edit` | FileEdit | Patch par sous-chaîne exacte ; refuse les matchs ambigus sauf `replaceAll: true` | +| `forge:grep` | Grep | Regex JS pure sur un filtre glob optionnel, ignore les binaires, 200 hits max | +| `forge:glob` | Glob | Matcher fait main pour `*` / `**` / `?`, 200 résultats max | + +Le runtime parse un bloc par tour, exécute, réinjecte le résultat structuré comme message système, et boucle jusqu'à `maxTurns` (cap dur à 10). Tous les tools sont sandboxés : path traversal, octets nuls et chemins absolus hors `/workspace` sont refusés. + +Pourquoi un protocole texte plutôt que les `tool_calls` natifs OpenAI ? Les LLM locaux (MLX, llama.cpp) ne respectent pas tous le tool-use natif, et un protocole unique entre builder et agents simplifie le débogage — le flux brut reste lisible. + +## Skills (patterns d'orchestration récurrents) + +Un seul message utilisateur peut mélanger deux intentions que le LLM tend à confondre — « ce que l'agent EST » et « ce que l'agent doit FAIRE MAINTENANT ». Les **skills** les séparent. + +Une skill est un fichier `SKILL.md` avec un frontmatter YAML (name, description, **triggers**, actions) et un corps markdown d'instructions. La CLI charge les skills depuis deux sources : + +- built-in : livrées sous `packages/core/src/builder/skills/` +- utilisateur : posez un fichier dans `~/.agent-forge/skills/.md` (ou `/SKILL.md` pour grouper des assets) et il prend le pas sur le built-in en cas de collision de nom + +Quand vous envoyez un message, la CLI le scanne côté serveur contre les phrases triggers de chaque skill (insensible à la casse, sous-chaîne). Si un trigger matche, le **runner** prend la main : deux appels LLM ciblés, un pour l'AGENT.md (rôle générique uniquement), un pour le run prompt (la tâche concrète), puis les deux blocs apparaissent en cards PROPOSED dans Mission Control. Vous approuvez dans l'ordre. Le LLM n'a jamais à prendre la méta-décision. + +La skill `scaffold-and-run` est livrée par défaut : elle se déclenche sur des mots comme `audite`, `teste`, `lance puis`, `audit`, `test it`, `then run`, `create and run`. Tapez `/skills` dans le REPL pour lister celles qui sont disponibles. + ## Slash commands utiles ``` @@ -159,9 +189,18 @@ Chaque session est persistée dans `~/.agent-forge/sessions//transcript.json /model change de modèle sur le provider actif /session affiche l'id de la session courante /sessions liste les sessions persistées +/skills liste les skills disponibles (built-in + user) /exit quitte ``` +## Raccourcis Mission Control + +- `Tab` / `Shift+Tab` — cycle le focus entre les cards d'action +- `Enter` — ouvre la card focus en plein écran +- `Esc` — retire le focus (ou ferme la vue détail) +- `↑↓ / PgUp / PgDn / g / G` — scroll dans la vue détail +- `Ctrl+E` — retour live dans le transcript + ## Architecture ``` @@ -170,23 +209,32 @@ Chaque session est persistée dans `~/.agent-forge/sessions//transcript.json │ │ │ forge CLI (= le builder LLM) │ │ ├─ TUI Ink (Mission Control + conversation) │ -│ ├─ Parser AGENT.md (frontmatter validé par Zod) │ -│ ├─ Tool FileWrite (sandboxé sous ~/.agent-forge) │ +│ ├─ Catalogue skills : built-in + ~/.agent-forge/skills/ │ +│ ├─ Matcher de triggers + skill runner côté serveur │ +│ ├─ Parsers AGENT.md / SKILL.md (validés par Zod) │ +│ ├─ Tool FileWrite (host, sandboxé sous ~/.agent-forge) │ │ └─ Tool DockerLaunch (lance des containers one-shot) │ └────────────────────┬────────────────────────────────────────┘ │ docker run --rm -i + │ -v /AGENT.md:/agent/AGENT.md:ro + │ -v :/runtime:ro + │ -v :/workspace ▼ ┌─────────────────────────────────────────────────────────────┐ │ CONTAINER (un par run d'agent, jetable) │ │ agent-forge/base:latest │ │ │ │ Runtime Node ── lit /agent/AGENT.md comme system prompt │ -│ └─ reçoit le prompt utilisateur via stdin │ -│ └─ streame la réponse du LLM sur stdout │ +│ ├─ reçoit le prompt utilisateur via stdin │ +│ ├─ streame la réponse du LLM sur stdout │ +│ └─ tool loop : forge:bash / write / read / │ +│ edit / grep / glob, capé à maxTurns │ +│ │ +│ /workspace ── espace en écriture, conservé après l'exit │ └─────────────────────────────────────────────────────────────┘ ``` -Les agents persistants (`docker exec`) et les teams multi-agents (un container, plusieurs process coordonnés via [`claude-presence`](https://github.com/garniergeorges/claude-presence)) arrivent en P5 et P7. +Les agents persistants (`docker exec` au lieu de `docker run --rm`) et les teams multi-agents (un container, plusieurs process coordonnés via [`claude-presence`](https://github.com/garniergeorges/claude-presence)) arrivent en P5 et P7. ## Stack technique @@ -203,14 +251,20 @@ Les agents persistants (`docker exec`) et les teams multi-agents (un container, ``` agent-forge/ ├── packages/ -│ ├── core/ # builder LLM, schéma AGENT.md, config provider -│ ├── cli/ # le binaire `forge` (REPL Ink + Mission Control) -│ ├── runtime/ # bundle exécuté dans chaque container d'agent -│ └── tools-core/ # FileWrite, DockerLaunch, … -├── docker/ # Dockerfiles -├── scripts/ # helpers de build (docker, hooks) -├── demo-sprites/ # mockup interactif (référence UX) -└── assets/ # images du README +│ ├── core/ # builder LLM, schémas, couche skills +│ │ └── src/builder/skills/ # fichiers SKILL.md built-in +│ ├── cli/ # le binaire `forge` (REPL Ink + Mission Control) +│ ├── runtime/ # bundle exécuté dans chaque container d'agent +│ │ └── src/tool-protocol.ts # parser forge:* + render des résultats +│ └── tools-core/ +│ ├── file-write.ts # FileWrite host (~/.agent-forge) +│ ├── docker-launch.ts # lanceur de containers one-shot +│ └── runtime/ # tools in-container : bash, file-write, +│ # file-read, file-edit, grep, glob +├── docker/ # Dockerfiles +├── scripts/ # helpers de build (docker, hooks) +├── demo-sprites/ # mockup interactif (référence UX) +└── assets/ # images du README ``` ## Genèse diff --git a/README.md b/README.md index c0f7711..ab7c624 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ **Forge, run, and orchestrate sandboxed LLM agents.** [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](./LICENSE) - ![Status: P3 done](https://img.shields.io/badge/status-P3%20done-green) + ![Status: P6 done](https://img.shields.io/badge/status-P6%20done-green) ![Stack: TypeScript + Bun](https://img.shields.io/badge/stack-TypeScript_+_Bun-3178c6) 🇬🇧 English version · [🇫🇷 Version française](./README.fr.md) @@ -16,7 +16,7 @@ --- -> 🚧 **Status — POC, milestone P3 reached.** You can now `bun run forge`, describe an agent in plain English or French, watch the builder draft the `AGENT.md`, approve it, then ask the builder to run that agent — it spins up its own Docker container, streams the output, and tears the sandbox down. Next milestone : P4 — native tools (Bash, FileRead, FileEdit, FileWrite, Grep, Glob). +> 🚧 **Status — POC, milestones P1 → P6 reached.** You can now `bun run forge`, describe an agent in plain English or French, watch the builder draft the `AGENT.md`, approve it, then ask the builder to run that agent — it spins up its own Docker container with **six native tools** (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) sandboxed under `/workspace`, streams the output, and tears the sandbox down. Recurring orchestration patterns are handled by **skills** : drop a `SKILL.md` in `~/.agent-forge/skills/` (or use the built-in `scaffold-and-run`) and the CLI auto-dispatches when a trigger phrase appears in your message. Next milestone : P5 — hardened sandbox + artifact extraction. ## What is Agent Forge ? @@ -35,9 +35,9 @@ The builder is the only conversational surface. Sub-agents are spawned on demand | **P1** | Hello agent in Docker (host script ↔ container ↔ LLM round-trip) | ✅ done | | **P2** | Conversational CLI (REPL Ink, EN/FR, slash commands, provider switch) | ✅ done | | **P3** | Builder writes `AGENT.md`, asks for permission, launches the agent in a fresh container, streams its output | ✅ done | -| P4 | Six native tools (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) usable from inside the sandbox | next | -| P5 | Hardened sandbox + artifact extraction back to host | | -| P6 | Skills enriched (project scaffolding, audits, fixes) | | +| **P4** | Six native tools sandboxed under `/workspace` : Bash, FileWrite, FileRead, FileEdit, Grep, Glob ; runtime tool-loop with `maxTurns` | ✅ done | +| **P6** | Skill layer : `SKILL.md` format, catalog (built-in + `~/.agent-forge/skills/`), server-side trigger matching, two-call runner (one for AGENT.md, one for the run prompt) | ✅ done | +| P5 | Hardened sandbox + persistent agents (`docker exec`) + artifact extraction back to host | next | | P7 | `TEAM.md` — coordinated multi-agent runs | | | P8 | Pixel-art dashboard (live agent activity) | | | P9 | ★ POC validated : Next.js + Laravel + QA demo end-to-end | | @@ -148,6 +148,36 @@ You can also switch on the fly inside the REPL : `/provider mistral`, `/model mi Every session is persisted to `~/.agent-forge/sessions//transcript.jsonl`. Use `/sessions` to list, `/session` to show the current id. +## Native tools (inside the agent sandbox) + +Agents launched by the builder run inside a disposable container with `/workspace` mounted as their writable root. Six native tools are exposed and called via fenced `forge:*` blocks the agent emits in its reply : + +| Tag | Tool | What it does | +|---|---|---| +| `forge:bash` | Bash | `bash -lc ` inside `/workspace`, 30 s default timeout (max 120 s), output clipped at 16 KB | +| `forge:write` | FileWrite | Create or overwrite a file under `/workspace`, parent dirs auto-created | +| `forge:read` | FileRead | Line-based offset/limit, 16 KB clip, fails on non-regular files | +| `forge:edit` | FileEdit | Exact-substring patch ; refuses ambiguous matches unless `replaceAll: true` | +| `forge:grep` | Grep | Pure JS regex over an optional glob filter, skips binaries, 200 hits cap | +| `forge:glob` | Glob | Hand-rolled `*` / `**` / `?` matcher, 200 results cap | + +The runtime parses one block per turn, executes it, feeds the structured result back as a system message, and loops up to `maxTurns` (capped at 10). All tools are sandboxed : path traversal, null bytes and absolute paths outside `/workspace` are refused. + +Why a text-structured protocol instead of OpenAI `tool_calls` ? Local LLMs (MLX, llama.cpp) don't all honour native tool-use, and a single protocol across builder and agents is easier to debug — the raw stream stays human-readable. + +## Skills (recurring orchestration patterns) + +A single user message can mix two intents the LLM tends to collapse — "what the agent IS" and "what the agent should do RIGHT NOW". **Skills** keep them apart. + +A skill is a `SKILL.md` file with a YAML frontmatter (name, description, **triggers**, actions) and a markdown body of instructions. The CLI loads skills from two sources : + +- built-in : shipped under `packages/core/src/builder/skills/` +- user : drop a file into `~/.agent-forge/skills/.md` (or `/SKILL.md` for grouped assets) and it overrides the built-in on name collision + +When you send a message, the CLI scans it server-side against every skill's trigger phrases (case-insensitive substring). If one matches, the skill **runner** takes over the turn : two narrow LLM calls, one for the AGENT.md (generic role only), one for the run prompt (the concrete task), then both blocks land as PROPOSED cards in Mission Control. You approve them in order. The LLM never has to make the meta-decision. + +Built-in `scaffold-and-run` ships today : it triggers on words like `audite`, `teste`, `lance puis`, `audit`, `test it`, `then run`, `create and run`. Type `/skills` in the REPL to list what's available. + ## Useful slash commands ``` @@ -159,9 +189,18 @@ Every session is persisted to `~/.agent-forge/sessions//transcript.jsonl`. U /model switch model on the active provider /session show the current session id /sessions list persisted sessions +/skills list available skills (built-in + user) /exit quit ``` +## Mission Control keyboard + +- `Tab` / `Shift+Tab` — cycle focus through action cards +- `Enter` — open the focused card in a full-screen detail view +- `Esc` — drop the focus (or close the detail view) +- `↑↓ / PgUp / PgDn / g / G` — scroll inside the detail view +- `Ctrl+E` — return the chat transcript to live mode + ## Architecture ``` @@ -170,23 +209,32 @@ Every session is persisted to `~/.agent-forge/sessions//transcript.jsonl`. U │ │ │ forge CLI (= the builder LLM) │ │ ├─ Ink TUI (Mission Control + conversation) │ -│ ├─ AGENT.md parser (Zod-validated frontmatter) │ -│ ├─ FileWrite tool (sandboxed under ~/.agent-forge) │ +│ ├─ Skill catalog : built-in + ~/.agent-forge/skills/ │ +│ ├─ Server-side trigger matcher + skill runner │ +│ ├─ AGENT.md / SKILL.md parsers (Zod-validated) │ +│ ├─ FileWrite tool (host, sandboxed under ~/.agent-forge) │ │ └─ DockerLaunch tool (spawns one-shot containers) │ └────────────────────┬────────────────────────────────────────┘ │ docker run --rm -i + │ -v /AGENT.md:/agent/AGENT.md:ro + │ -v :/runtime:ro + │ -v :/workspace ▼ ┌─────────────────────────────────────────────────────────────┐ │ CONTAINER (one per agent run, disposable) │ │ agent-forge/base:latest │ │ │ │ Node runtime ── reads /agent/AGENT.md as system prompt │ -│ └─ pipes the user prompt through stdin │ -│ └─ streams the LLM answer to stdout │ +│ ├─ pipes the user prompt through stdin │ +│ ├─ streams the LLM answer to stdout │ +│ └─ tool loop : forge:bash / write / read / │ +│ edit / grep / glob, capped at maxTurns │ +│ │ +│ /workspace ── writable scratchpad, kept on host after exit │ └─────────────────────────────────────────────────────────────┘ ``` -Long-running agents (`docker exec`) and multi-agent teams (one container, many processes coordinating via [`claude-presence`](https://github.com/garniergeorges/claude-presence)) land in P5 and P7. +Persistent agents (`docker exec` instead of `docker run --rm`) and multi-agent teams (one container, many processes coordinating via [`claude-presence`](https://github.com/garniergeorges/claude-presence)) land in P5 and P7. ## Tech stack @@ -203,14 +251,20 @@ Long-running agents (`docker exec`) and multi-agent teams (one container, many p ``` agent-forge/ ├── packages/ -│ ├── core/ # builder LLM, AGENT.md schema, provider config -│ ├── cli/ # the `forge` binary (Ink REPL + Mission Control) -│ ├── runtime/ # bundle that runs inside each agent container -│ └── tools-core/ # FileWrite, DockerLaunch, … -├── docker/ # Dockerfiles -├── scripts/ # build helpers (docker, hooks) -├── demo-sprites/ # interactive mockup (UX reference) -└── assets/ # README images +│ ├── core/ # builder LLM, schemas, skill layer +│ │ └── src/builder/skills/ # built-in SKILL.md files +│ ├── cli/ # the `forge` binary (Ink REPL + Mission Control) +│ ├── runtime/ # bundle that runs inside each agent container +│ │ └── src/tool-protocol.ts # forge:* parser + result renderers +│ └── tools-core/ +│ ├── file-write.ts # host-side FileWrite (~/.agent-forge) +│ ├── docker-launch.ts # one-shot container launcher +│ └── runtime/ # in-container tools : bash, file-write, +│ # file-read, file-edit, grep, glob +├── docker/ # Dockerfiles +├── scripts/ # build helpers (docker, hooks) +├── demo-sprites/ # interactive mockup (UX reference) +└── assets/ # README images ``` ## Genesis diff --git a/packages/cli/README.md b/packages/cli/README.md index 50d0d5f..86bac92 100644 --- a/packages/cli/README.md +++ b/packages/cli/README.md @@ -4,20 +4,25 @@ Binaire `forge` — CLI conversationnelle. ## Ce que ça fait -Héberge le **builder LLM** dans un REPL Ink. L'utilisateur décrit ce qu'il veut, le builder génère des fichiers `AGENT.md` (P3) puis `TEAM.md` (P7) et lance les containers Docker correspondants. +Héberge le **builder LLM** dans un REPL Ink. L'utilisateur décrit ce qu'il veut, le builder génère des fichiers `AGENT.md` puis lance les containers Docker correspondants. Quand le message déclenche une **skill**, la CLI prend la main et orchestre directement (deux appels LLM ciblés au lieu d'un wide). ## État -**Phase POC, P3 livré.** Couvre : +**Phase POC, P1 → P6 livrés.** Couvre : - REPL Ink bilingue EN/FR (sélecteur de langue au premier lancement) - Splash + preflight checks (Docker dispo, image base, runtime bundle) -- Mission Control (zone haute) — affiche les actions du builder (write, run) avec coloration syntaxique YAML +- Mission Control (zone haute) — affiche les actions du builder (write, run, skill) avec : + - mode compact 1-ligne par défaut, expand sur la card focus + - viewport scrollable avec indicateurs `↑ N above / ↓ N below` + - auto-focus de la nouvelle card arrivée, running cards toujours expandées + - vue détail plein écran (Enter), highlight Markdown/YAML/JSON/agent-run - Conversation (zone basse) — uniquement le langage naturel, transcripts persistés en JSONL - Permission dialog (Y / N / D) avant toute écriture ou lancement -- Slash commands : `/help`, `/clear`, `/reset`, `/lang`, `/provider`, `/model`, `/session`, `/sessions`, `/exit` +- Slash commands : `/help`, `/clear`, `/reset`, `/lang`, `/provider`, `/model`, `/session`, `/sessions`, `/skills`, `/exit` - Provider-agnostic via Vercel AI SDK (Mistral, OpenAI, MLX local…) - Sessions persistées dans `~/.agent-forge/sessions//transcript.jsonl` +- **Couche skills** : matching des triggers côté serveur, dispatch automatique vers le runner `scaffold-and-run` quand un trigger matche ## Lancement @@ -36,37 +41,44 @@ bun run forge # depuis la racine du monorepo /model change de modèle sur le provider actif /session affiche l'id de la session courante /sessions liste les sessions persistées +/skills liste les skills disponibles (built-in + user) /exit quitte ``` ## Raccourcis clavier ``` -[⏎] envoyer -[PgUp/PgDn] scroll dans le transcript -[Ctrl+E] retour au live -[Y/N/D] approuver / refuser / aperçu (dialog de permission) +[⏎] envoyer un message +[PgUp/PgDn] scroll Mission Control (si focus actif ou input vide) + sinon scroll dans le transcript +[Ctrl+E] retour live dans le transcript +[Tab/Shift+Tab] cycle focus entre les cards Mission Control +[Enter] sur focus ouvre la card en détail plein écran +[Esc] retire le focus, ou ferme la vue détail +[Y/N/D] approuve / refuse / aperçu (dialog de permission) ``` ## Structure ``` src/ -├── index.tsx entrée Ink -├── App.tsx layout deux zones (Mission Control xor Splash, puis Welcome) +├── index.tsx entrée Ink +├── App.tsx layout + routage clavier global ├── components/ -│ ├── MissionControl.tsx zone haute, cards d'actions +│ ├── MissionControl.tsx zone haute, cards compactes / expandées + viewport +│ ├── CardDetail.tsx vue plein écran d'une card focus │ ├── ProviderLogo.tsx logo pixel art du provider actif │ ├── Welcome.tsx zone basse (header + transcript + prompt + footer) │ ├── ChatViewport.tsx transcript scrollable │ ├── ConfirmAction.tsx dialog de permission Y/N/D │ ├── Splash.tsx écran de boot -│ └── syntax.ts highlighter YAML / plain +│ └── syntax.ts highlighters YAML / Markdown / JSON / agent-run ├── hooks/ -│ ├── useChat.ts state machine (messages, actions, streaming) +│ ├── useChat.ts state machine (messages, actions, streaming, dispatch skills) +│ ├── useCardFocus.ts focus + scrollTop + auto-focus + auto-scroll │ └── useChatContext.tsx React context wrapper -├── actions/ types Action (write, run) -├── builder-actions.ts parser des blocs forge:write / forge:run +├── actions/ types Action (write, run, skill) +├── builder-actions.ts parser des blocs forge:write / forge:run / forge:skill ├── commands.ts slash commands ├── config/ .env, presets providers, langue ├── i18n/ EN/FR strings @@ -76,4 +88,4 @@ src/ ## Suite -P4 — exposer six tools natifs (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) au runtime, pour que les agents puissent agir sur leur propre `/workspace`. +P5 — sandbox durci, agents persistants via `docker exec`, extraction d'artefacts du `/workspace` vers le host. diff --git a/packages/core/README.md b/packages/core/README.md index 3668fc8..4562ada 100644 --- a/packages/core/README.md +++ b/packages/core/README.md @@ -2,22 +2,27 @@ Primitives de base d'Agent Forge. -## Contenu (état P3) +## Contenu (état P6) - **`builder/`** — l'agent LLM conversationnel qui conçoit les autres agents - `provider.ts` — résout `FORGE_BASE_URL` / `FORGE_API_KEY` / `FORGE_MODEL`, supporte les overrides à chaud (`/provider`, `/model`) - - `system-prompt.ts` — prompt système bilingue EN/FR avec ACTION PROTOCOL et RUN PROTOCOL (fenced blocks `forge:write` et `forge:run`) - - `stream.ts` — `streamBuilder({ messages, lang })` via Vercel AI SDK + - `system-prompt.ts` — prompt système bilingue EN/FR avec ACTION PROTOCOL et RUN PROTOCOL (fenced blocks `forge:write` et `forge:run`), plus la liste informationnelle des skills disponibles + - `stream.ts` — `streamBuilder({ messages, lang, skills })` via Vercel AI SDK + - **`skill-catalog.ts`** — discovery des `SKILL.md` (built-in dans `skills/`, utilisateur dans `~/.agent-forge/skills/`) + - **`skill-matcher.ts`** — match côté serveur des triggers (sous-chaîne insensible à la casse) + - **`skill-runner.ts`** — orchestration de `scaffold-and-run` (deux appels `generateText` ciblés, un pour AGENT.md, un pour le run prompt) + - **`skills/scaffold-and-run.md`** — première skill built-in - **`types/agent-md.ts`** — `parseAgentMd(text)` : sépare frontmatter / body, valide via Zod (name kebab-case, description non vide, sandbox.image, sandbox.timeout, maxTurns) +- **`types/skill-md.ts`** — `parseSkillMd(text)` : même pattern pour les skills (name, description, triggers, actions) ## À venir - **`docker/`** — abstraction sandbox (P5 : agents persistants via `docker exec`, pas seulement `run --rm`) -- **`tools/`** — interface `Tool` partagée (P4) +- **`tools/`** — interface `Tool` partagée ## Dependencies - `ai`, `@ai-sdk/openai` — Vercel AI SDK pour les appels LLM provider-agnostic -- `zod` — validation du frontmatter `AGENT.md` -- `@modelcontextprotocol/sdk` — intégration MCP (P6+) +- `zod` — validation du frontmatter `AGENT.md` et `SKILL.md` +- `@modelcontextprotocol/sdk` — intégration MCP (P7+) - `yaml` — parsing du frontmatter diff --git a/packages/runtime/README.md b/packages/runtime/README.md index c852792..702f4df 100644 --- a/packages/runtime/README.md +++ b/packages/runtime/README.md @@ -2,16 +2,25 @@ Le process qui tourne **à l'intérieur** des containers Docker lancés par Agent Forge. -## Ce que ça fait (état P3) +## Ce que ça fait (état P4) 1. Lit le fichier `/agent/AGENT.md` monté en lecture seule dans le container 2. Sépare le frontmatter (validé Zod côté host) du corps Markdown -3. Utilise le corps comme **system prompt** de l'agent +3. Utilise le corps comme **system prompt** de l'agent, plus une section TOOLS qui décrit les six tools disponibles 4. Récupère le prompt utilisateur via stdin -5. Streame la réponse du LLM (`streamText` du Vercel AI SDK) sur stdout, chunk par chunk +5. **Tool loop multi-turns** : + - streame la réponse du LLM (`streamText` du Vercel AI SDK) sur stdout, chunk par chunk + - parse le premier bloc `forge:*` que l'agent émet + - exécute le tool correspondant (Bash / FileWrite / FileRead / FileEdit / Grep / Glob) + - réinjecte le résultat structuré comme message utilisateur dans la conversation + - boucle jusqu'à ce que l'agent réponde sans bloc OU que `maxTurns` soit atteint (cap dur à 10) 6. Sort avec le code 0 quand le LLM a fini -Le container est lancé avec `docker run --rm -i`, donc il est détruit dès la sortie. +Le container est lancé avec `docker run --rm -i`, donc il est détruit dès la sortie. Le `/workspace` (bind-mount RW) est conservé sur le host pour inspection / extraction d'artefacts (P5). + +## Protocole tool agent-side + +Voir `src/tool-protocol.ts` pour le parser et les renderers de résultats. Les six tags reconnus sont `forge:bash`, `forge:write`, `forge:read`, `forge:edit`, `forge:grep`, `forge:glob`. Les résultats sont écrits sur stdout entre marqueurs `[forge:tool]` / `[/forge:tool]` pour que le host TUI puisse les router dans la card Mission Control. ## Variables d'environnement @@ -21,6 +30,7 @@ Héritées du host par le `DockerLaunch` tool : FORGE_BASE_URL endpoint OpenAI-compatible FORGE_API_KEY clé (peut être vide pour MLX local) FORGE_MODEL nom du modèle +FORGE_MAX_TOKENS optionnel, default 1024 par tour ``` ## Build @@ -33,7 +43,6 @@ Produit `dist/runtime.mjs`. **Cible Node, pas Bun** — les containers tournent ## À venir -- **P4** — exposer six tools natifs (Bash, FileRead, FileEdit, FileWrite, Grep, Glob) à l'agent depuis l'intérieur du container -- **P5** — agents persistants via `docker exec` (au lieu de `docker run --rm` jetable) +- **P5** — sandbox durci (read-only root FS, network policy, resource caps), agents persistants via `docker exec` au lieu de `docker run --rm` jetable - **P5** — extraction d'artefacts du `/workspace` du container vers le host -- **P6** — `claude-presence` MCP pour la coordination entre agents d'une même team +- **P7** — `claude-presence` MCP pour la coordination entre agents d'une même team diff --git a/packages/tools-core/README.md b/packages/tools-core/README.md index e55ec9b..acff97f 100644 --- a/packages/tools-core/README.md +++ b/packages/tools-core/README.md @@ -2,32 +2,39 @@ Tools natifs partagés entre le builder (côté host) et le runtime (dans le container). -## État P3 +## État P4 -Deux tools livrés et utilisés dans le parcours `forge` : +### Tools host -- **`FileWrite`** — écrit sous `~/.agent-forge/agents//` avec sandbox de chemin (refuse tout `..`, refuse les écrasements sauf `overwrite: true` quand l'utilisateur a confirmé dans le dialog de permission). Schéma Zod sur l'input. -- **`DockerLaunch`** — `launchAgent({ agent, prompt })` : retourne un handle `{ containerName, events: AsyncGenerator, abort }`. Spawn `docker run --rm -i`, monte `AGENT.md` + le bundle runtime, hérite des env vars provider, force le cleanup en `try/finally`. +Utilisés par le builder pour préparer / lancer les agents : -## Tools prévus pour P4 +- **`FileWrite`** (`src/file-write.ts`) — écrit sous `~/.agent-forge/agents//` avec sandbox de chemin (refuse tout `..`, refuse les écrasements sauf `overwrite: true` quand l'utilisateur a confirmé dans le dialog de permission). Schéma Zod sur l'input. +- **`DockerLaunch`** (`src/docker-launch.ts`) — `launchAgent({ agent, prompt })` retourne un handle `{ containerName, events: AsyncGenerator, abort }`. Spawn `docker run --rm -i`, monte `AGENT.md` + le bundle runtime + un `/workspace` RW propre par run, hérite des env vars provider, force le cleanup en `try/finally`. -Depuis l'intérieur du container, accessibles à l'agent : +### Tools runtime (in-container) -- **`Bash`** — exécution shell, restreinte au `/workspace` -- **`FileRead`** — lecture avec offset/limit -- **`FileEdit`** — patch par `old_string` / `new_string` -- **`FileWrite`** — version "in-container" (différente de la version builder host) -- **`Grep`** — recherche ripgrep -- **`Glob`** — pattern matching +Utilisés par les agents eux-mêmes via le tool-loop du runtime, tous sandboxés sous `/workspace` : + +- **`Bash`** (`src/runtime/bash.ts`) — exécution shell (`bash -lc`), timeout 30 s par défaut (max 120 s), output clippé à 16 Ko +- **`FileWrite`** (`src/runtime/file-write.ts`) — version in-container, écrase par défaut (différente de la version host qui est stricte) +- **`FileRead`** (`src/runtime/file-read.ts`) — offset/limit en lignes, clip 16 Ko, refuse les non-fichiers +- **`FileEdit`** (`src/runtime/file-edit.ts`) — patch par sous-chaîne exacte, refuse les matchs ambigus sauf `replaceAll: true` +- **`Grep`** (`src/runtime/grep.ts`) — regex JS pure sur un filtre glob optionnel, ignore les binaires (octets nuls dans les 4 Ko de tête), 200 hits max, lignes clippées à 400 chars +- **`Glob`** (`src/runtime/glob.ts`) — matcher fait main pour `*` / `**` / `?`, 200 résultats max, walk borné à 5000 nodes + +Tous les tools runtime utilisent `resolveSandboxedPath` pour valider les chemins. La racine sandbox est `/workspace` en production ; pour les tests, `FORGE_WORKSPACE` peut la rediriger vers un dossier temp. ## Interface tool -```ts -type Tool = { - name: string - schema: ZodSchema - run(input: Input, ctx: ToolContext): AsyncGenerator -} -``` +Pattern Vercel AI SDK : Zod schema + fonction pure `execute*` qui retourne un résultat structuré (`{ ok: true, … }` ou `{ ok: false, error: string }`). Pas d'instances ni d'effets cachés — chaque appel est self-contained, ce qui simplifie les tests. + +## Tests -Pattern emprunté à l'analyse OpenClaude (`../../analyse/06-tools-system.md`). +`tests/` couvre : +- `file-write.test.ts` — host FileWrite (path safety, sandbox escape, refus d'écrasement) +- `runtime-bash.test.ts` — stdout / stderr / exit / timeout / cwd +- `runtime-file-write.test.ts` — sandbox escape, traversal, écrasement, parent-dir +- `runtime-file-read.test.ts` — offset/limit, fichier manquant, sandbox escape +- `runtime-file-edit.test.ts` — match unique, ambiguïté, replaceAll, missing oldString +- `runtime-grep.test.ts` — case sensitivity, glob filter, regex invalide +- `runtime-glob.test.ts` — `**/*`, `*` mono-segment, `?`, no-match