🐛 Keep Scoutbot on local authority#178
Merged
Merged
Conversation
SCO-059 session-knowledge exploration as an interactive studio surface. Six-stage pipeline (discover → drilldown) walking a real Codex or Claude session through preparation. Discover scans ~/.codex/sessions and ~/.claude/projects against the actual filesystem; Normalize opens the selected JSONL and parses the head into uniform records. Introduces a studio-internal primitive: Command<I, O> + runCommand with per-input TTL caching, plus a CommandSurface chrome component owning copyable shell line, ran/cached badge, and output frame. Renderer registry is parked for the next consumer. Also lands the parallel KnowledgeSearchScreen sketch on the web product side for future cross-reference.
Remove Extract/Index/Query/Drilldown panels and the WeekBudgetFooter:
their contents were synthesized hand-coded outputs, not real runs of the
described work. Per direction to "not do anything fake", the page now
only shows stages with real implementations behind them:
- Discover: real filesystem scan of ~/.codex/sessions and ~/.claude/projects
- Normalize: real JSONL parse of the selected session, with raw → normalized
inspect view
Pipeline strip narrows to 2 chips. ~700 lines of dead illustration code
removed (StudySelection narrows to {sessionId, stageId}, helpers gone).
Also unwraps Codex response_item / event_msg / turn_context wrappers in
the normalizer so the stream shows real user_turn / assistant_turn /
command_or_tool / observation kinds instead of opaque system_record.
Real Extract stage on the session-search workbench. Selecting Extract on a real session runs one extraction end-to-end and writes 6 files to $TMPDIR/scout-study/qmd/<session>/: Mechanical pass (no LLM, iterates normalized records): - manifest.json — source path, harness, recordsScanned, bytesRead - files.md — paths pulled from tool args, grouped by hits + tools touched - tool-calls.md — counts per tool name + first 30 invocations - events-NNN.md — windowed event lines with source-ordered indices LLM pass (one MiniMax-M2 call per session, cached for 1h): - overview.md — what the session was about (2 paragraphs) - decisions.md — decisions made + open follow-ups - _llm-call.json — model, usage, latency for inspection Adds the studio command primitive's first real LLM consumer. Secret access via lib/secrets.ts shells to `secret get` (keychain) so no env files touched. MiniMax client in lib/llm/minimax.ts returns content + reasoning + structured usage so the panel can surface real cost. ExtractPanel renders the file list (mech / llm tagged), live preview of the selected artifact, and a footnote with real disk path + timings + token counts. Pipeline strip now shows Discover → Normalize → Extract, all real.
Every page render now ends with a run trace showing what just happened underneath: one row per command (inventory, parse-session, extract-qmd), with wall time, cached status (with "saved ~Nms" callout), and the LLM model + token breakdown for the entries that ran a model. Header summarizes the request as a whole — wall ms (cached entries contribute 0), number cached and ms saved, cumulative LLM tokens, rough USD cost using a per-model rate table (MiniMax-M2 in for now). Page handler now orchestrates all commands explicitly so the run log can be built before rendering. NormalizePanel and ExtractPanel become presentational (receive their CommandRun as a prop). This is the "what's happening underneath" the workbench needed — when a fresh session triggers a real LLM call you see Extract QMD · ran · 7385 ms · MiniMax-M2 · 1002+681t, and on re-visit you see cached (saved ~Nms) with the original cost visible so you understand what the cache bought.
The page handler now only awaits the cheap inventory before returning. Everything panel-dependent (parse, extract, enrich) moves into an async <StageBody> wrapped in <Suspense>. When you click a session pill or a pipeline chip, chrome (header, picker, pipeline strip, stage header, prev/next nav) renders instantly while the stage body shows a structured in-flight skeleton. Skeleton lists the commands that will run for the active stage with expected timing — Inventory · Parse session · Extract QMD · Enrich (LLM). For Enrich the user immediately sees "first run: 5–15 s" so the wait is explained, not opaque. Suspense boundary keyed on session+stage so each click resets the boundary cleanly. RunSummary moves inside StageBody so its log is built after all panel commands resolve. This pairs with the existing run trace footer: skeleton = expected sequence, trace = what actually ran.
Three workbench polish moves prompted by usage:
1. ArtifactPicker client island — Extract / Enrich panels pre-read every
artifact's content server-side and hand the array to a "use client"
component that swaps preview body via local state. URL stays in sync
via router.replace (pushState) so links are shareable, but clicking a
file does not navigate, does not refetch, does not flash the chrome.
Verified zero document loads + zero HTTP doc requests on click.
2. Force re-run — runCommand learns an optional { force: true } that
bypasses the cache lookup. URL ?force=<stageId> threads it through
to the active stage; CommandSurface gains a "re-run ↻" link in the
command header. Lets you see Extract / Normalize / Enrich actually
run instead of always reading "cached".
3. Drop "(mechanical)" labels now that Enrich is split out — Extract's
single output kind speaks for itself.
Suspense boundary key includes the force param so a re-run triggers the
in-flight skeleton properly.
- parseSessionCommand reads from the head in 64 KiB chunks until it has
`limit` newlines (or EOF). Previously it read one 128 KiB buffer, which
capped any session at whatever fit in that window — Normalize at
limit=14 was always ~20 ms regardless of source size.
- Lift Normalize's record limit from 14 to 1500, matching Extract /
Enrich so all three share the parse-session cache key. Force-re-running
Normalize now shows a meaningful workload (~150–340 ms on codex-large
parsing 1500 records).
- NormalizedStreamBody caps the visible stream at 30 rows + a "N parsed
but hidden · M not parsed" tail so the page stays compact.
- Force handling expands: ?force=all bypasses every command's cache in
the active pipeline; ?force=<stageId> still works for a single stage.
Inventory honors force too via the discover stage id.
- Three new re-run affordances:
* "re-run all ↻" link in the page header (force=all)
* "re-run ↻" already exists in each CommandSurface header
* per-row "re-run ↻" links in the run trace footer, mapping command id
to its stage so you can rerun any single step from the trace
Suspense key includes the force param so the in-flight skeleton appears
on force-rerun.
Two responsiveness wins: 1. Run-in-place — RerunLink Client Component wraps every re-run affordance (header "re-run all", in-CommandSurface "re-run", per-row trace re-runs). It uses useTransition so the previous UI stays mounted while the navigation streams in. The link swaps its label to "running ↻", gets aria-busy=true, and pulses (animate-pulse + status-info-fg) while the request is in flight. The stage Suspense key drops the force param so force-rerun navigation no longer unmounts the panel — old data stays visible until the new render lands. Session / stage switches still show the skeleton (key still changes on those). 2. Trace inspect — each run-trace row is now a <details> that expands to show the actual shell-equivalent, input summary, output summary, and resolved cache key for that command's run. makeRunLogEntry gains summarizeInput / summarizeOutput hooks so each command projects its inputs and outputs into one-line strings for the inspect drawer.
All three command inputs (parse-session, extract-qmd, enrich-session) now accept limit: number | \"all\". Defaults flip to \"all\" so every session is parsed end-to-end, not capped at a 1500-record head slice. readHeadLines streams from the file until EOF when the limit is Infinity, so memory stays bounded by the file size rather than a fixed buffer. parseSessionCommand.shell switches to \`cat …\` when limit is \"all\" so the copyable shell line stays honest. Measured on a fresh process: - codex-large (13 MiB, 4,220 events) normalize force: ~620 ms - claude-large (52 MiB, 12,009 events) normalize force: ~5.3 s The NormalizedStreamBody display cap of 30 rows still applies; tail text now reflects \"parsed but hidden\" rather than \"not parsed\" once nothing is being skipped.
…x.db Pipeline gains a fifth stage that turns the QMD sidecar files into a real, queryable data store. Selecting Index on a session walks every $TMPDIR/scout-study/qmd/<session>/ directory, splits each markdown file into H2 sections, and writes rows into a better-sqlite3 db with FTS5. Schema: - sessions (id, harness, indexed_at) - documents (id, session_id, kind, path, bytes) - chunks (id, document_id, ordinal, source_ref, text) - chunks_fts (virtual FTS5 over chunks.text, with triggers for sync) Tried `bun --bun next dev` first for bun:sqlite; it ships but breaks the client Router with "Router action dispatched before initialization" errors, so the RerunLink useTransition pattern fails. Reverted to `next dev` + better-sqlite3 — boring, works, FTS5 included. IndexPanel shows the schema with live row counts, plus a "this session" breakdown so you can see what just landed. Run trace footer treats index-corpus as a regular command with input / output summaries and a re-run link. Verified end to end with a sqlite3 CLI snippet from outside the app: - 1.17 MB db file on disk after one session - chunks_fts MATCH 'VOX' returns real source-anchored snippets from decisions.md and events-001.md
New /studies/data surface that makes the shape of the session-search index legible: schema, shortcuts deck, unified MATCH/SELECT Query card with a strategy registry, and an Ask field that turns a question into FTS5 hits via local stopword stripping — sub-millisecond, no proprietary tokeniser in the loop. Same Ask surface also lands as Step 6 in the session-search pipeline workbench with schema-aware suggestion chips.
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
scoutbotfrom being overwritten by peer mesh syncVerification
bun test packages/web/server/scoutbot/runner.test.tsbun test packages/runtime/src/broker-daemon.test.ts --test-name-pattern "keeps node-local scoutbot authority"bun test packages/runtime/src/local-agents.test.ts --test-name-pattern "session warmup|warmed local session"npm --prefix packages/runtime run check/api/sendto Scoutbot completed flightflt-mpvek2j5-9t1uhlwith outputOK