Skip to content

🐛 Keep Scoutbot on local authority#178

Merged
arach merged 15 commits into
mainfrom
codex/search-workbench-studio-view
Jun 1, 2026
Merged

🐛 Keep Scoutbot on local authority#178
arach merged 15 commits into
mainfrom
codex/search-workbench-studio-view

Conversation

@arach
Copy link
Copy Markdown
Owner

@arach arach commented Jun 1, 2026

Summary

  • keep node-local product agents like scoutbot from being overwritten by peer mesh sync
  • make Scoutbot bootstrap refresh registration when ownership drifts away from the local node
  • clear stale local-session endpoint failure metadata and mark warmed sessions idle so Scoutbot can run again

Verification

  • bun test packages/web/server/scoutbot/runner.test.ts
  • bun test packages/runtime/src/broker-daemon.test.ts --test-name-pattern "keeps node-local scoutbot authority"
  • bun test packages/runtime/src/local-agents.test.ts --test-name-pattern "session warmup|warmed local session"
  • npm --prefix packages/runtime run check
  • manual web-runner smoke: /api/send to Scoutbot completed flight flt-mpvek2j5-9t1uhl with output OK

arach added 13 commits May 30, 2026 14:41
SCO-059 session-knowledge exploration as an interactive studio surface.
Six-stage pipeline (discover → drilldown) walking a real Codex or Claude
session through preparation. Discover scans ~/.codex/sessions and
~/.claude/projects against the actual filesystem; Normalize opens the
selected JSONL and parses the head into uniform records.

Introduces a studio-internal primitive: Command<I, O> + runCommand with
per-input TTL caching, plus a CommandSurface chrome component owning
copyable shell line, ran/cached badge, and output frame. Renderer
registry is parked for the next consumer.

Also lands the parallel KnowledgeSearchScreen sketch on the web product
side for future cross-reference.
Remove Extract/Index/Query/Drilldown panels and the WeekBudgetFooter:
their contents were synthesized hand-coded outputs, not real runs of the
described work. Per direction to "not do anything fake", the page now
only shows stages with real implementations behind them:

- Discover: real filesystem scan of ~/.codex/sessions and ~/.claude/projects
- Normalize: real JSONL parse of the selected session, with raw → normalized
  inspect view

Pipeline strip narrows to 2 chips. ~700 lines of dead illustration code
removed (StudySelection narrows to {sessionId, stageId}, helpers gone).

Also unwraps Codex response_item / event_msg / turn_context wrappers in
the normalizer so the stream shows real user_turn / assistant_turn /
command_or_tool / observation kinds instead of opaque system_record.
Real Extract stage on the session-search workbench. Selecting Extract on
a real session runs one extraction end-to-end and writes 6 files to
$TMPDIR/scout-study/qmd/<session>/:

Mechanical pass (no LLM, iterates normalized records):
- manifest.json — source path, harness, recordsScanned, bytesRead
- files.md — paths pulled from tool args, grouped by hits + tools touched
- tool-calls.md — counts per tool name + first 30 invocations
- events-NNN.md — windowed event lines with source-ordered indices

LLM pass (one MiniMax-M2 call per session, cached for 1h):
- overview.md — what the session was about (2 paragraphs)
- decisions.md — decisions made + open follow-ups
- _llm-call.json — model, usage, latency for inspection

Adds the studio command primitive's first real LLM consumer. Secret
access via lib/secrets.ts shells to `secret get` (keychain) so no env
files touched. MiniMax client in lib/llm/minimax.ts returns content +
reasoning + structured usage so the panel can surface real cost.

ExtractPanel renders the file list (mech / llm tagged), live preview of
the selected artifact, and a footnote with real disk path + timings +
token counts. Pipeline strip now shows Discover → Normalize → Extract,
all real.
Every page render now ends with a run trace showing what just happened
underneath: one row per command (inventory, parse-session, extract-qmd),
with wall time, cached status (with "saved ~Nms" callout), and the LLM
model + token breakdown for the entries that ran a model.

Header summarizes the request as a whole — wall ms (cached entries
contribute 0), number cached and ms saved, cumulative LLM tokens, rough
USD cost using a per-model rate table (MiniMax-M2 in for now).

Page handler now orchestrates all commands explicitly so the run log can
be built before rendering. NormalizePanel and ExtractPanel become
presentational (receive their CommandRun as a prop).

This is the "what's happening underneath" the workbench needed — when a
fresh session triggers a real LLM call you see Extract QMD · ran · 7385
ms · MiniMax-M2 · 1002+681t, and on re-visit you see cached (saved ~Nms)
with the original cost visible so you understand what the cache bought.
The page handler now only awaits the cheap inventory before returning.
Everything panel-dependent (parse, extract, enrich) moves into an async
<StageBody> wrapped in <Suspense>. When you click a session pill or a
pipeline chip, chrome (header, picker, pipeline strip, stage header,
prev/next nav) renders instantly while the stage body shows a structured
in-flight skeleton.

Skeleton lists the commands that will run for the active stage with
expected timing — Inventory · Parse session · Extract QMD · Enrich
(LLM). For Enrich the user immediately sees "first run: 5–15 s" so the
wait is explained, not opaque.

Suspense boundary keyed on session+stage so each click resets the
boundary cleanly. RunSummary moves inside StageBody so its log is built
after all panel commands resolve.

This pairs with the existing run trace footer: skeleton = expected
sequence, trace = what actually ran.
Three workbench polish moves prompted by usage:

1. ArtifactPicker client island — Extract / Enrich panels pre-read every
   artifact's content server-side and hand the array to a "use client"
   component that swaps preview body via local state. URL stays in sync
   via router.replace (pushState) so links are shareable, but clicking a
   file does not navigate, does not refetch, does not flash the chrome.
   Verified zero document loads + zero HTTP doc requests on click.

2. Force re-run — runCommand learns an optional { force: true } that
   bypasses the cache lookup. URL ?force=<stageId> threads it through
   to the active stage; CommandSurface gains a "re-run ↻" link in the
   command header. Lets you see Extract / Normalize / Enrich actually
   run instead of always reading "cached".

3. Drop "(mechanical)" labels now that Enrich is split out — Extract's
   single output kind speaks for itself.

Suspense boundary key includes the force param so a re-run triggers the
in-flight skeleton properly.
- parseSessionCommand reads from the head in 64 KiB chunks until it has
  `limit` newlines (or EOF). Previously it read one 128 KiB buffer, which
  capped any session at whatever fit in that window — Normalize at
  limit=14 was always ~20 ms regardless of source size.

- Lift Normalize's record limit from 14 to 1500, matching Extract /
  Enrich so all three share the parse-session cache key. Force-re-running
  Normalize now shows a meaningful workload (~150–340 ms on codex-large
  parsing 1500 records).

- NormalizedStreamBody caps the visible stream at 30 rows + a "N parsed
  but hidden · M not parsed" tail so the page stays compact.

- Force handling expands: ?force=all bypasses every command's cache in
  the active pipeline; ?force=<stageId> still works for a single stage.
  Inventory honors force too via the discover stage id.

- Three new re-run affordances:
  * "re-run all ↻" link in the page header (force=all)
  * "re-run ↻" already exists in each CommandSurface header
  * per-row "re-run ↻" links in the run trace footer, mapping command id
    to its stage so you can rerun any single step from the trace

Suspense key includes the force param so the in-flight skeleton appears
on force-rerun.
Two responsiveness wins:

1. Run-in-place — RerunLink Client Component wraps every re-run
   affordance (header "re-run all", in-CommandSurface "re-run", per-row
   trace re-runs). It uses useTransition so the previous UI stays mounted
   while the navigation streams in. The link swaps its label to
   "running ↻", gets aria-busy=true, and pulses (animate-pulse +
   status-info-fg) while the request is in flight.

   The stage Suspense key drops the force param so force-rerun navigation
   no longer unmounts the panel — old data stays visible until the new
   render lands. Session / stage switches still show the skeleton (key
   still changes on those).

2. Trace inspect — each run-trace row is now a <details> that expands
   to show the actual shell-equivalent, input summary, output summary,
   and resolved cache key for that command's run. makeRunLogEntry gains
   summarizeInput / summarizeOutput hooks so each command projects its
   inputs and outputs into one-line strings for the inspect drawer.
All three command inputs (parse-session, extract-qmd, enrich-session)
now accept limit: number | \"all\". Defaults flip to \"all\" so every
session is parsed end-to-end, not capped at a 1500-record head slice.

readHeadLines streams from the file until EOF when the limit is
Infinity, so memory stays bounded by the file size rather than a fixed
buffer. parseSessionCommand.shell switches to \`cat …\` when limit is
\"all\" so the copyable shell line stays honest.

Measured on a fresh process:
- codex-large (13 MiB, 4,220 events) normalize force: ~620 ms
- claude-large (52 MiB, 12,009 events) normalize force: ~5.3 s

The NormalizedStreamBody display cap of 30 rows still applies; tail
text now reflects \"parsed but hidden\" rather than \"not parsed\" once
nothing is being skipped.
…x.db

Pipeline gains a fifth stage that turns the QMD sidecar files into a
real, queryable data store. Selecting Index on a session walks every
$TMPDIR/scout-study/qmd/<session>/ directory, splits each markdown file
into H2 sections, and writes rows into a better-sqlite3 db with FTS5.

Schema:
- sessions   (id, harness, indexed_at)
- documents  (id, session_id, kind, path, bytes)
- chunks     (id, document_id, ordinal, source_ref, text)
- chunks_fts (virtual FTS5 over chunks.text, with triggers for sync)

Tried `bun --bun next dev` first for bun:sqlite; it ships but breaks the
client Router with "Router action dispatched before initialization"
errors, so the RerunLink useTransition pattern fails. Reverted to
`next dev` + better-sqlite3 — boring, works, FTS5 included.

IndexPanel shows the schema with live row counts, plus a "this session"
breakdown so you can see what just landed. Run trace footer treats
index-corpus as a regular command with input / output summaries and a
re-run link.

Verified end to end with a sqlite3 CLI snippet from outside the app:
- 1.17 MB db file on disk after one session
- chunks_fts MATCH 'VOX' returns real source-anchored snippets from
  decisions.md and events-001.md
New /studies/data surface that makes the shape of the session-search
index legible: schema, shortcuts deck, unified MATCH/SELECT Query card
with a strategy registry, and an Ask field that turns a question into
FTS5 hits via local stopword stripping — sub-millisecond, no proprietary
tokeniser in the loop. Same Ask surface also lands as Step 6 in the
session-search pipeline workbench with schema-aware suggestion chips.
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
openscout Skipped Skipped Jun 1, 2026 5:04pm

@arach arach merged commit 1f84868 into main Jun 1, 2026
3 checks passed
@arach arach deleted the codex/search-workbench-studio-view branch June 1, 2026 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant