Starter cards were silently sabotaging the PotableLM persona. Trimmed them and noted the underlying portability problem.
- What was wrong: the "Build a Tool" and "Write a SOP" cards in
session-surface.tsxwere each dumping ~60 lines of intake-form prompt into the composer ("Step 1 — present the intake template below exactly as written..."). That prompt overrodeAGENTS.md's rewritten doctrine ("skip the intake form, use the profile, build in ~3 short messages"). End result: clicking a card looked indistinguishable from copy-pasting a template into a browser AI — none of the persona's value came through. - Fix: card prompts are now terse triggers. Build a Tool →
"Help me build a plant tool."; Write a SOP →"Help me write an SOP.". AGENTS.md (offer-menu → ONE question → build → confirm) runs unopposed. - Underlying issue — persona portability: the PotableLM persona, plant profile, and tool catalog live in
~/.config/opencode/AGENTS.md+plant-profile.md(loaded viaopencode.jsoninstructions). That's user-global, not app-bundled. Consequences:- On a fresh machine without the config restore, the cards become very thin triggers and the AI behaves like a generic assistant. The
config-snapshot/README is the restore path for now. - Opening the
potable_worksource repo itself as a workspace double-loads the global AGENTS.md and the repo's dev-guide AGENTS.md — voice and rules conflict. Don't open it as an operator workspace. - If the user switches to a model that handles system prompts loosely, the doctrine softens. Haiku 4.5 has held it well in testing.
- On a fresh machine without the config restore, the cards become very thin triggers and the AI behaves like a generic assistant. The
- Followup options when ready to ship beyond this machine (not done now): (a) bundle the persona + tool catalog as a workspace blueprint so any new workspace gets it without touching
~/.config/opencode/; (b) inject it as an app-level system prompt inapps/app/src/app/constants.tsso it loads even with no AGENTS.md present; (c) ship a "first-run" copy of the three config files into~/.config/opencode/on install. (a) is the lightest, (b) is the most defensive, (c) plays nicest with users who already use opencode for other things.
PotableLM persona, plant profile, and HTML tool auto-preview wired end-to-end. The app now opens with the operator's facility context already loaded and produces tools that pop straight into the browser.
- Global system prompt — PotableLM persona:
C:\Users\wests\.config\opencode\AGENTS.mdholds the operator-voice persona, answer-structure rules, conventions (no commercial brands, no em dashes, units mandatory), safety doctrine (supervisor verification for dosing / public health calls), and what PotableLM is not (not a substitute for a licensed operator, SOPs, primacy agency). opencode auto-loads global AGENTS.md into every session, every workspace, every model. Caveat: opening thepotable_worksource repo itself as a workspace double-loads with its repo AGENTS.md (OpenWork dev guide) — don't. - Plant profile auto-load:
C:\Users\wests\.config\opencode\plant-profile.mdis the operator's facility reference (currently a 160 MGD CA conventional plant: 5 sed basins, 12 dual-media filters at 2,000 sf each, 0.7 MG clearwell + 2× 1.2 MG serpentine CT, chloraminated finished at 3.3 mg/L total Cl2, pH 8.9). Loaded into every session viaC:\Users\wests\.config\opencode\opencode.jsonwithinstructions: ["...plant-profile.md"]. - Why not "ask first, then read at runtime": tried that first. opencode's read tool aborted on a path outside the workspace, so the AI fell back to manually quizzing the operator.
instructionsauto-load is reliable and removes permission friction. AGENTS.md still teaches the AI to treat(default, confirm)entries as soft-assumed and[TBD]as real gaps to flag (never invent). - AGENTS.md tool-building doctrine: skip the intake-form ritual. Skip the "review before I code" pause for compact tools. Profile values are built-in defaults, never user questions. Only expose real-time-variable inputs in tools (current flow, current headloss, current measurement) — static plant config stays baked in. HTML artifacts go to
tools/<name>.html; SOPs go todocs/<name>.md. - HTML tool auto-preview watcher:
electron/main.mjssetupToolsWatcher()watches<workspace>/tools/*.html. On a new or overwritten file,shell.openExternal('file:///...')pops it open in the operator's default browser. Per-file 30 s debounce so AI iteration doesn't focus-steal. Registered on boot and on every workspace activation IPC. Reason: chat surface has no inline HTML rendering, and adding one means touching the markdown renderer we just stabilized. External-browser pop is the lowest-risk preview and matches the basin-sim launch pattern operators already know. Change is live fordemo-launch.ps1(loads main.mjs from source); the packaged win-unpacked exe needspnpm package:electron:dirto rebuildapp.asarbefore it picks up. - AI streaming response — final fix: earlier polling-gate workaround was scrapped. The real bug was
handleSendinsession-surface.tsxcallingsetSending(false)right afteronSendDraftresolved — but that promise resolves on prompt-accept, not response-complete, so polling died before responses arrived. Fix: leavesendingtrue after send and let the existingliveStatus → idleeffect clear it. Addedsnapshotto that effect's deps so fast responses (busy window between polls) are also caught. Composer focus during AI streaming remains a known acceptable cost; structural /React.memorefactor deferred. - Tool catalog + design standards baked into AGENTS.md: the global PotableLM prompt now carries the full operator-tool design spec (one-screen layout, fixed units with hidden conversion, Advanced toggle, traffic-light Pass/Fail at top, single Calculate, self-contained inline HTML, print stylesheet) and an 11-tool catalog (CT compliance / coagulant dose / lime-soda-ash & LSI/RSI/CCPP / chlorine demand-breakpoint / chlorine-chloramine dose / fluoride feed / filter loading-UFRV-backwash / basin detention-T10 / distribution decay-water age / Pb-Cu 90th percentile / daily MOR). Each catalog entry lists real-time inputs (exposed) vs profile-baked defaults (never asked). Operator-facing flow tightened to: offer menu if no tool named → ONE question ("use plant profile defaults? y/n") → build → one-line confirmation. No intake forms, no review-before-code pauses for compact tools. The big spec is internal to the AI; operator sees ~3 short messages start-to-artifact.
Fixed potable_work chat UI not showing AI responses until a manual refresh. Most of the debugging time was wasted on a stale bundle — read this before touching it again.
- Real time-sink — stale bundle: Electron was loading a pre-built
dist/bundle, not the Vite dev server, so every code change appeared to "do nothing." Spot it in DevTools → Network: a hashed JS file likeindex-CsJB8kwj.js+file:///index.html= stale build; raw.tsxsource files +@react-refresh+index.htmlfromhttp://localhost:5173/= dev server (correct). - Correct startup order: start Vite first (
potable_work/apps/app: set$env:OPENWORK_DEV_MODE = "1"thenpnpm dev:windows— plainpnpm devuses Unix env-var syntax and fails on Windows), wait forLocal: http://localhost:5173/, then rundemo-launch.ps1. If Vite isn't up when Electron starts, Electron silently falls back to the built bundle. - Actual bug: opencode's SSE event stream is unreliable here (goes quiet after initial events; known upstream issue), so live UI updates never arrive. Fixes work around it via snapshot polling.
- Fix 1 —
session-render-state.ts: removed themessageListContainsAllshortcut inderiveRenderedSessionMessagesthat preferred empty live SSE stubs over real snapshot text. Now always merges. - Fix 2 —
session-surface.tsx:sseStuckcheck (trust polled snapshot status when SSE is stuck on "busy") + 1ssetIntervalpolling ofsnapshotQuery.refetch()(React Query'srefetchIntervaldoesn't fire reliably in Electron). - Dead ends removed, do not re-add:
refetchInterval/refetchIntervalInBackground,structuralSharing: false/gcTime: 0/notifyOnChangeProps: "all", arenderTickre-render counter (made the composer stutter), and a_t=Date.now()URL cache-buster (never the problem). Also deleted now-deadhydratedKeyRefand a duplicateseedSessionStateeffect. - Lesson: before debugging "my change had no effect," confirm the running app is actually loading your code.
- Permanent fix (later same day):
electron/main.mjsnow defaultsstartUrltohttp://localhost:5173wheneverOPENWORK_DEV_MODE=1and not packaged — previously it only used the dev server ifOPENWORK_ELECTRON_START_URLwas explicitly set (onlydemo-launch.ps1did that). Launching Electron any other way fell back to the staleapp/distbuild. Now dev mode can't load the stale bundle; if Vite isn't up it fails loudly instead. Still: start Vite before Electron. - Use mode vs dev mode (final setup):
demo-launch.ps1no longer setsOPENWORK_DEV_MODE— it loads the prebuiltapps/app/distbundle vialoadFile, no dev server needed, seamless for normal use. After changing UI code, rebuild that bundle: fromapps/app,OPENWORK_ELECTRON_BUILD=1 pnpm build(the env var sets Vitebase: "./"for relative asset paths —loadFileusesfile://, so absolute/assets/...paths white-screen). Plainpnpm build:uifrom root omits that env var — don't use it for the demo-launch bundle. For active development: run the Vite dev server +OPENWORK_DEV_MODE=1(hot reload).
Restructured the municipal taxonomy to 16 domain-organized categories. Updated all toolchain scripts, reference documents, and existing records to match.
- Replaced the previous 16-category flat taxonomy with a new set organized by cognitive task and failure mode independence. Full rationale in TAXONOMY.md.
- Rewrote
validate.py: checks required metadata fields, validates category against the 16 new values, enforces system/user/assistant message role order, exits non-zero on any error for CI. - Rewrote
stats.py: approved-records-only coverage report with per-category counts, below-target flags, source type and difficulty breakdown. - Rewrote
new_record.py: interactive numbered category picker, subcategory/difficulty/tags prompts. - Rewrote
batch_scaffold.py: reads a seeds file (one scenario description per line), prompts once for category and difficulty for the whole batch. - Rewrote
TAXONOMY.md: full definitions, subcategory directions, and design rationale for each category. - Updated
README.md: new taxonomy table, two-track model plan, project structure, sponsor section. - Updated
CLAUDE.md: new category enum list, removed stale references to Jaccard duplicate detection. - Reclassified wt-0002 from
math_and_calculationstodisinfection_and_oxidation. - Reclassified wt-0003 from
taste_and_odortowater_source_and_reservoir_management. - Added sponsor acknowledgment: Robot Garden, Livermore CA (robotgarden.org).
- Decision:
pH_and_alkalinityandSCADA_and_controls_infrastructureuse mixed case intentionally. All other categories and all subcategories remain snake_case.
Built the complete dataset toolchain. The project now has working infrastructure for authoring, validating, exporting, and evaluating training data.
- Added
validate.py: schema validation against all SCHEMA.md rules, token-length heuristic, unique ID checks, Jaccard similarity duplicate detection on user prompts. - Added
export.py: strips metadata to produce clean training JSONL. Filters by status, category, difficulty, version. Optional system prompt injection. - Added
stats.py: coverage report by category, subcategory, difficulty, source type, review status. Token estimate stats (min/max/mean/median). Gap detection against full taxonomy. - Added
eval.py: golden evaluation framework with must_contain/must_not_contain checks. Three starter eval cases (filtration, calculations, disinfection). - Added
new_record.py: interactive CLI scaffolder with auto-incrementing IDs and canonical system prompt injection. - Added
batch_scaffold.py: batch record creation from a seeds file. Dry-run mode for planning. - Created
data/system_prompt.txt: minimal canonical system prompt (two sentences). - Created
data/seeds/municipal_starter.txt: 26 planned records across all 16 municipal categories. - Added GitHub Actions CI workflow (
validate.yml) to run validation on push/PR. - Added Makefile with targets for all common operations.
- Added
docs/IDEAS.mdfor future exploration directions (facility dataset creation, gamified capture, AutoAgent/Meta-Harness agent frameworks). - Three test fixture records in
data/raw/(draft, ai_generated) to prove the toolchain. - Updated README with current repo layout and toolchain quickstart.
- Added CLAUDE.md with full project instructions for Claude Code sessions.
- Added
.editorconfigfor consistent line endings and indentation. - Expanded golden eval set to 8 cases covering: filtration, math/calculations, disinfection, safety (confined space entry), regulations (IESWTR turbidity), troubleshooting (multi-filter pattern), plant operations (shift recovery), emergency response (contamination suspicion).
- Decision: one record per
.jsonfile (better for git diffs and review). - Decision: pure stdlib Python 3.10+, no external dependencies.
- Decision: minimal system prompt — let fine-tuning internalize voice rather than overloading the system message.
- Decision: eval cases use keyword checks (must_contain/must_not_contain) as the initial scoring method. Semantic scoring deferred until inference pipeline exists.
- Pushed all work to GitHub.
- Next: author real operator content using the seeds file as a guide. Run
python3 scripts/batch_scaffold.py data/seeds/municipal_starter.txtto create blank records.
Established the initial public project scaffold for Potable.
- Created the GitHub repository structure.
- Drafted the main project README.
- Drafted a Hugging Face dataset card for Potable Dataset.
- Drafted a Hugging Face model card for PotableLM.
- Added overview and roadmap documents.
- Added dataset reference docs: schema, taxonomy, style guide, annotation guide, and changelog.
- Standardized naming around: Project: Potable Dataset: Potable Dataset Models: PotableLM
Add new entries at the top using this structure:
## YYYY-MM-DD
Short summary sentence.
- concrete change or decision
- concrete change or decision
- open question or next step