feat: per-deployment learnings persistence for sysops by josephfung · Pull Request #30 · josephfung/trimkit

josephfung · 2026-05-12T18:33:28Z

Closes #7

Summary

Adds per-deployment learnings persistence so the sysops agent can remember server quirks between sessions
trimkit-learnings-log — appends a structured learning entry to ~/.claude/sysops/learnings.jsonl
trimkit-learnings-search — reads/deduplicates the learnings store; --human for formatted output
trimkit-sysops-log-search — reads the audit log; replaces the ~80-line inline Python viewer in SKILL.md
/sysops learnings sub-command — view stored learnings per deployment
agents/sysops.md — agent loads learnings before checks/maintenance and writes them when relevant
skills/sysops/SKILL.md — reduced from ~170 lines of inline Python to a ~30-line routing layer that calls the bin scripts

Why the refactor

PR #20 had the viewer logic duplicated: once in trimkit-learnings-search (JSONL) and again inline in SKILL.md (human-readable). Adding --human flags to the bin scripts eliminates the duplication. The same pattern applied to the existing /sysops log viewer — extracted into trimkit-sysops-log-search.

Test plan

113 BATS tests all pass (npx bats tests/bin/)
/sysops log shows formatted audit entries or "no log found"
/sysops log Pulse --last 5 filters and limits correctly
/sysops learnings shows formatted learnings or "no learnings stored"
/sysops learnings Pulse filters to one deployment
trimkit-learnings-search (no --human) still outputs JSONL — agent integration unbroken

Summary by CodeRabbit

Release Notes

New Features
- Added deployment learnings storage to capture and retrieve insights from past operations
- Introduced /sysops learnings command to query stored deployment knowledge
- Enhanced /sysops log command with improved filtering and formatting options
- Sysops agent now automatically loads relevant learnings before handling deployments and persists new insights from operations
Documentation
- Expanded guides for log search and learnings features with usage examples and configuration options

- trimkit-learnings-log: reads JSON from stdin, injects ts, appends to ~/.claude/sysops/learnings.jsonl (same atomic append pattern as trimkit-sysops-log) - trimkit-learnings-search: reads learnings.jsonl, deduplicates by key (latest entry wins), filters by --deployment, outputs JSONL to stdout - Both scripts support TRIMKIT_SYSOPS_LEARNINGS_DIR / _FILE env overrides for test isolation - 22 bats tests covering appending, dedup, filtering, edge cases, and concurrent writes

At the start of processing each deployment, the agent now calls trimkit-learnings-search to surface stored quirks and known-safe items before running checks or maintenance. After status checks and maintenance, the agent writes a learning via trimkit-learnings-log when it observes something worth persisting: known-safe containers, post-reboot restart quirks, procedure deviations, and similar server-specific context. Learnings are written silently (no user prompt) and noted inline in the report. Loading/writing are both guarded with `command -v` so the agent degrades gracefully when trimkit is not installed.

skills/sysops/SKILL.md: - Add learnings branch: /sysops learnings [Deployment] - Read-only viewer with inline Python3; deduplicates by key (latest wins) and formats each entry with type, confidence, source, and recorded timestamp — consistent with the /sysops log pattern sysops/README.md: - Document the learnings store: location, full schema, example entry, /sysops learnings command syntax - Document trimkit-learnings-log and trimkit-learnings-search scripts and their test env-var overrides

1. Cross-deployment key collision: dedup in trimkit-learnings-search and the /sysops learnings skill viewer now uses (deployment, key) tuple instead of key alone. Previously, a learning for Curia with key "shared-key" would suppress the Pulse learning with the same key. 2. trimkit-learnings-search --deployment without a value: silently returned all entries instead of erroring. Now validates $# before accepting the argument and exits 1 with a clear message. 3. trimkit-learnings-log missing key validation: an entry with an empty or missing "key" field is now rejected with exit 1. The key is the dedup anchor; an empty key collapses all keyless entries into one bucket and corrupts search results. Also adds 5 new bats tests covering these cases (67 total, all passing).

agents/sysops.md: - Load-learnings sections (status check + maintenance): add handling for non-zero exit from trimkit-learnings-search ("learnings unavailable" note); surface stderr warnings under Known quirks heading - Write-learnings Call 2: distinguish between "tool not on PATH" (skip silently, omit note) and "tool found but failed" ("learning write failed: <key>" in report) — mirrors the audit log pattern - Temp file: use session-scoped path /tmp/trimkit-sysops-learning-<SESSION>.json to prevent parallel invocation collisions bin/trimkit-learnings-log: - Validate deployment, type, and insight fields in addition to key - Reject unknown type values (must be quirk|known-safe|procedure|warning) tests: add 3 new bats tests for the new validations (70 total, all passing)

- bin/trimkit-learnings-search: fix header — dedup is by (deployment, key) pair, not bare key; missing-file case exits 0 with no output (no stderr message was ever written — the comment was wrong) - bin/trimkit-learnings-log: fix header — document correct dedup scope - sysops/README.md: fix trimkit-learnings-search description to say (deployment, key) pair - skills/sysops/SKILL.md: fix error handlers to print to stderr (not stdout) so the agent correctly distinguishes errors from learnings output - tests: add 4 new bats tests — missing insight validation, empty-string --deployment, corrupt JSONL skipped with valid entries still surfaced, corrupt JSONL emits a stderr warning (74 total, all passing)

Adds human-readable formatted output mode alongside the existing JSONL output. When --human is passed, the script outputs formatted text with deployment, type, key, confidence, source, and timestamp — plus insight on a separate line. This moves the viewer logic out of SKILL.md and into the testable bin script.

Extracts the audit log reading/formatting logic from SKILL.md into a standalone, testable bin script. Supports --deployment filter, --last N limit, and --human flag for formatted output. JSONL output by default for machine consumption.

Replaces inline Python viewers for /sysops log and /sysops learnings with calls to the bin scripts (trimkit-sysops-log-search --human and trimkit-learnings-search --human). SKILL.md is now a thin routing layer that parses arguments and delegates to the appropriate script.

Adds 7 tests for trimkit-learnings-search --human flag covering formatted output, filtering, empty/missing file messages, dedup, and corrupt line warnings. Creates 29-test suite for the new trimkit-sysops-log-search covering JSONL output, --deployment filter, --last limit, --human formatting, error handling, and edge cases.

Adds trimkit-sysops-log-search section to the sysops README and updates the trimkit-learnings-search section to show --human usage.

- trimkit-learnings-log: wrap json.load() in try-except to emit a clean error message on invalid JSON instead of a raw Python traceback - trimkit-learnings-log: validate confidence field (must be a number 0.0-1.0) - agents/sysops.md: note learnings write skip (tool not on PATH) instead of silently omitting the note, aligning with the audit log pattern - SKILL.md: add error handling guidance when bin scripts exit non-zero - Remove unused `os` import from Python blocks in search scripts - Add CHANGELOG entries for the new features - Add tests for invalid JSON, missing confidence, and out-of-range confidence

josephfung · 2026-05-12T18:42:12Z

@coderabbitai review

coderabbitai · 2026-05-12T18:42:18Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-05-12T18:46:54Z

📝 Walkthrough

Walkthrough

This pull request implements persistent memory for the sysops agent: an append-only JSONL store at ~/.claude/sysops/learnings.jsonl that captures deployment-specific quirks, procedures, and warnings. Three new shell utilities (trimkit-learnings-log, trimkit-learnings-search, trimkit-sysops-log-search) handle writing and querying entries. The sysops playbook now loads relevant learnings before starting status checks and maintenance, then writes newly discovered patterns after completion. User-facing /sysops log and /sysops learnings commands route through the skill to those utilities. Tests cover validation, deduplication, filtering, corruption handling, and concurrent writes across the entire suite.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the primary feature: per-deployment learnings persistence for sysops, matching the core changeset.
Linked Issues check	✅ Passed	The PR comprehensively implements all coding requirements from issue `#7`: learnings JSONL storage, trimkit-learnings-log/search scripts, agent integration, /sysops learnings command, and deduplication logic.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#7` objectives: learnings scripts, agent integration, skill routing, documentation, and comprehensive BATS test coverage.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bin/trimkit-learnings-log`:
- Around line 57-60: Validation currently checks only ('deployment', 'key',
'type', 'insight') so the required "source" field is skipped; update the field
list in the validation loop in bin/trimkit-learnings-log (the for field in (...)
block that calls obj.get(field)) to include "source" so the script exits with an
error if obj.get('source') is missing or empty, maintaining the same error
message format for consistency.
- Around line 66-67: The validation wrongly accepts booleans because
isinstance(True, (int, float)) is True; update the conditional that checks conf
(the existing if conf is None or not isinstance(conf, (int, float)) or not (0.0
<= conf <= 1.0):) to explicitly reject bools (e.g., ensure conf is a number and
not an instance of bool) before checking the 0.0–1.0 range so that True/False do
not pass validation; keep the existing error message behavior when the check
fails.

In `@bin/trimkit-learnings-search`:
- Around line 63-64: Update the two echo messages in the
bin/trimkit-learnings-search script so they reference the configured variable
TRIMKIT_SYSOPS_LEARNINGS_FILE instead of the hardcoded
"~/.claude/sysops/learnings.jsonl"; locate the conditional/usage around the
existing echo lines and replace the literal path text with the variable (or an
evaluated/display-friendly form of it) so users see the actual file path the
script is using.

In `@bin/trimkit-sysops-log-search`:
- Around line 78-79: The hardcoded message prints "~/.claude/sysops/audit.jsonl"
even when TRIMKIT_SYSOPS_LOG_FILE is set; update the script to determine the
effective path (e.g. set a LOG_FILE variable from TRIMKIT_SYSOPS_LOG_FILE with a
fallback to "$HOME/.claude/sysops/audit.jsonl") and use that LOG_FILE in the
echo lines instead of the literal string so the message reflects any env
override; ensure tilde expansion by using $HOME rather than "~" when building
the fallback.

In `@skills/sysops/SKILL.md`:
- Line 8: Change the ambiguous "starts with `log` or `learnings`" routing rule
to explicitly state "if the first word of the argument is `log` or `learnings`"
so routing uses a first-word (token) match rather than a string prefix; update
the SKILL.md text around the /sysops routing rule and any related examples to
mention matching the first word (or using a word-boundary/token check) when
deciding not to delegate to the sysops subagent.

In `@sysops/README.md`:
- Around line 84-86: Add language identifiers to the unlabeled fenced code
blocks to satisfy markdownlint MD040: change the block that contains
"~/.claude/sysops/learnings.jsonl" to use a text fence (```text) and change the
command block that lists "/sysops learnings" examples to use a bash shell fence
(```bash); update both occurrences (the single-path block around
"~/.claude/sysops/learnings.jsonl" and the multi-line commands block around
"/sysops learnings ...") so each opening fence includes the appropriate language
label.
- Line 80: The README's Learnings overview incorrectly states deduplication is
by `key`; update the text to state that deduplication is done per (deployment,
key) so entries with the same `key` in different `deployment`s are kept
separately; reference the Learnings description and the terms `key` and
`deployment` and change the sentence to explicitly say "deduplicated by
(deployment, key): the latest entry for a given key within a deployment
supersedes earlier ones."

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 370188ee-367b-486a-af09-e3c0401a6fee

📥 Commits

Reviewing files that changed from the base of the PR and between a5db8eb and 9b85fc8.

📒 Files selected for processing (10)

CHANGELOG.md
agents/sysops.md
bin/trimkit-learnings-log
bin/trimkit-learnings-search
bin/trimkit-sysops-log-search
skills/sysops/SKILL.md
sysops/README.md
tests/bin/trimkit-learnings-log.bats
tests/bin/trimkit-learnings-search.bats
tests/bin/trimkit-sysops-log-search.bats

- Validate 'source' field as required in trimkit-learnings-log - Reject boolean confidence values (bool is a subclass of int in Python) - Use configured $LEARNINGS_FILE/$LOG_FILE in missing-file messages instead of hardcoded default paths - Tighten SKILL.md routing to "first word" match to prevent prefix collisions (e.g. /sysops logging) - Fix README dedup scope: (deployment, key), not just key - Add language tags to unlabeled fenced blocks in README (MD040) - Add tests: boolean confidence, missing source field

josephfung · 2026-05-12T19:35:37Z

@coderabbitai approve

coderabbitai · 2026-05-12T19:35:50Z

✅ Actions performed

Comments resolved and changes approved.

josephfung added 12 commits May 12, 2026 12:37

docs: document trimkit-sysops-log-search and --human flags

2b89732

Adds trimkit-sysops-log-search section to the sysops README and updates the trimkit-learnings-search section to show --human usage.

coderabbitai Bot requested changes May 12, 2026

View reviewed changes

coderabbitai Bot approved these changes May 12, 2026

View reviewed changes

josephfung merged commit 8c739c0 into main May 12, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: per-deployment learnings persistence for sysops#30

feat: per-deployment learnings persistence for sysops#30
josephfung merged 13 commits into
mainfrom
feat/slim-skill-md

josephfung commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

josephfung commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026

Walkthrough

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

josephfung commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

josephfung commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why the refactor

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

josephfung commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026

Walkthrough

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

josephfung commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

josephfung commented May 12, 2026 •

edited by coderabbitai Bot

Loading