feat: per-deployment learnings persistence for sysops#30
Conversation
- trimkit-learnings-log: reads JSON from stdin, injects ts, appends to ~/.claude/sysops/learnings.jsonl (same atomic append pattern as trimkit-sysops-log) - trimkit-learnings-search: reads learnings.jsonl, deduplicates by key (latest entry wins), filters by --deployment, outputs JSONL to stdout - Both scripts support TRIMKIT_SYSOPS_LEARNINGS_DIR / _FILE env overrides for test isolation - 22 bats tests covering appending, dedup, filtering, edge cases, and concurrent writes
At the start of processing each deployment, the agent now calls trimkit-learnings-search to surface stored quirks and known-safe items before running checks or maintenance. After status checks and maintenance, the agent writes a learning via trimkit-learnings-log when it observes something worth persisting: known-safe containers, post-reboot restart quirks, procedure deviations, and similar server-specific context. Learnings are written silently (no user prompt) and noted inline in the report. Loading/writing are both guarded with `command -v` so the agent degrades gracefully when trimkit is not installed.
skills/sysops/SKILL.md: - Add learnings branch: /sysops learnings [Deployment] - Read-only viewer with inline Python3; deduplicates by key (latest wins) and formats each entry with type, confidence, source, and recorded timestamp — consistent with the /sysops log pattern sysops/README.md: - Document the learnings store: location, full schema, example entry, /sysops learnings command syntax - Document trimkit-learnings-log and trimkit-learnings-search scripts and their test env-var overrides
1. Cross-deployment key collision: dedup in trimkit-learnings-search and the /sysops learnings skill viewer now uses (deployment, key) tuple instead of key alone. Previously, a learning for Curia with key "shared-key" would suppress the Pulse learning with the same key. 2. trimkit-learnings-search --deployment without a value: silently returned all entries instead of erroring. Now validates $# before accepting the argument and exits 1 with a clear message. 3. trimkit-learnings-log missing key validation: an entry with an empty or missing "key" field is now rejected with exit 1. The key is the dedup anchor; an empty key collapses all keyless entries into one bucket and corrupts search results. Also adds 5 new bats tests covering these cases (67 total, all passing).
agents/sysops.md:
- Load-learnings sections (status check + maintenance): add handling
for non-zero exit from trimkit-learnings-search ("learnings
unavailable" note); surface stderr warnings under Known quirks heading
- Write-learnings Call 2: distinguish between "tool not on PATH"
(skip silently, omit note) and "tool found but failed" ("learning
write failed: <key>" in report) — mirrors the audit log pattern
- Temp file: use session-scoped path
/tmp/trimkit-sysops-learning-<SESSION>.json to prevent parallel
invocation collisions
bin/trimkit-learnings-log:
- Validate deployment, type, and insight fields in addition to key
- Reject unknown type values (must be quirk|known-safe|procedure|warning)
tests: add 3 new bats tests for the new validations (70 total, all passing)
- bin/trimkit-learnings-search: fix header — dedup is by (deployment, key) pair, not bare key; missing-file case exits 0 with no output (no stderr message was ever written — the comment was wrong) - bin/trimkit-learnings-log: fix header — document correct dedup scope - sysops/README.md: fix trimkit-learnings-search description to say (deployment, key) pair - skills/sysops/SKILL.md: fix error handlers to print to stderr (not stdout) so the agent correctly distinguishes errors from learnings output - tests: add 4 new bats tests — missing insight validation, empty-string --deployment, corrupt JSONL skipped with valid entries still surfaced, corrupt JSONL emits a stderr warning (74 total, all passing)
Adds human-readable formatted output mode alongside the existing JSONL output. When --human is passed, the script outputs formatted text with deployment, type, key, confidence, source, and timestamp — plus insight on a separate line. This moves the viewer logic out of SKILL.md and into the testable bin script.
Extracts the audit log reading/formatting logic from SKILL.md into a standalone, testable bin script. Supports --deployment filter, --last N limit, and --human flag for formatted output. JSONL output by default for machine consumption.
Replaces inline Python viewers for /sysops log and /sysops learnings with calls to the bin scripts (trimkit-sysops-log-search --human and trimkit-learnings-search --human). SKILL.md is now a thin routing layer that parses arguments and delegates to the appropriate script.
Adds 7 tests for trimkit-learnings-search --human flag covering formatted output, filtering, empty/missing file messages, dedup, and corrupt line warnings. Creates 29-test suite for the new trimkit-sysops-log-search covering JSONL output, --deployment filter, --last limit, --human formatting, error handling, and edge cases.
Adds trimkit-sysops-log-search section to the sysops README and updates the trimkit-learnings-search section to show --human usage.
- trimkit-learnings-log: wrap json.load() in try-except to emit a clean error message on invalid JSON instead of a raw Python traceback - trimkit-learnings-log: validate confidence field (must be a number 0.0-1.0) - agents/sysops.md: note learnings write skip (tool not on PATH) instead of silently omitting the note, aligning with the audit log pattern - SKILL.md: add error handling guidance when bin scripts exit non-zero - Remove unused `os` import from Python blocks in search scripts - Add CHANGELOG entries for the new features - Add tests for invalid JSON, missing confidence, and out-of-range confidence
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThis pull request implements persistent memory for the sysops agent: an append-only JSONL store at 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bin/trimkit-learnings-log`:
- Around line 57-60: Validation currently checks only ('deployment', 'key',
'type', 'insight') so the required "source" field is skipped; update the field
list in the validation loop in bin/trimkit-learnings-log (the for field in (...)
block that calls obj.get(field)) to include "source" so the script exits with an
error if obj.get('source') is missing or empty, maintaining the same error
message format for consistency.
- Around line 66-67: The validation wrongly accepts booleans because
isinstance(True, (int, float)) is True; update the conditional that checks conf
(the existing if conf is None or not isinstance(conf, (int, float)) or not (0.0
<= conf <= 1.0):) to explicitly reject bools (e.g., ensure conf is a number and
not an instance of bool) before checking the 0.0–1.0 range so that True/False do
not pass validation; keep the existing error message behavior when the check
fails.
In `@bin/trimkit-learnings-search`:
- Around line 63-64: Update the two echo messages in the
bin/trimkit-learnings-search script so they reference the configured variable
TRIMKIT_SYSOPS_LEARNINGS_FILE instead of the hardcoded
"~/.claude/sysops/learnings.jsonl"; locate the conditional/usage around the
existing echo lines and replace the literal path text with the variable (or an
evaluated/display-friendly form of it) so users see the actual file path the
script is using.
In `@bin/trimkit-sysops-log-search`:
- Around line 78-79: The hardcoded message prints "~/.claude/sysops/audit.jsonl"
even when TRIMKIT_SYSOPS_LOG_FILE is set; update the script to determine the
effective path (e.g. set a LOG_FILE variable from TRIMKIT_SYSOPS_LOG_FILE with a
fallback to "$HOME/.claude/sysops/audit.jsonl") and use that LOG_FILE in the
echo lines instead of the literal string so the message reflects any env
override; ensure tilde expansion by using $HOME rather than "~" when building
the fallback.
In `@skills/sysops/SKILL.md`:
- Line 8: Change the ambiguous "starts with `log` or `learnings`" routing rule
to explicitly state "if the first word of the argument is `log` or `learnings`"
so routing uses a first-word (token) match rather than a string prefix; update
the SKILL.md text around the /sysops routing rule and any related examples to
mention matching the first word (or using a word-boundary/token check) when
deciding not to delegate to the sysops subagent.
In `@sysops/README.md`:
- Around line 84-86: Add language identifiers to the unlabeled fenced code
blocks to satisfy markdownlint MD040: change the block that contains
"~/.claude/sysops/learnings.jsonl" to use a text fence (```text) and change the
command block that lists "/sysops learnings" examples to use a bash shell fence
(```bash); update both occurrences (the single-path block around
"~/.claude/sysops/learnings.jsonl" and the multi-line commands block around
"/sysops learnings ...") so each opening fence includes the appropriate language
label.
- Line 80: The README's Learnings overview incorrectly states deduplication is
by `key`; update the text to state that deduplication is done per (deployment,
key) so entries with the same `key` in different `deployment`s are kept
separately; reference the Learnings description and the terms `key` and
`deployment` and change the sentence to explicitly say "deduplicated by
(deployment, key): the latest entry for a given key within a deployment
supersedes earlier ones."
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 370188ee-367b-486a-af09-e3c0401a6fee
📒 Files selected for processing (10)
CHANGELOG.mdagents/sysops.mdbin/trimkit-learnings-logbin/trimkit-learnings-searchbin/trimkit-sysops-log-searchskills/sysops/SKILL.mdsysops/README.mdtests/bin/trimkit-learnings-log.batstests/bin/trimkit-learnings-search.batstests/bin/trimkit-sysops-log-search.bats
- Validate 'source' field as required in trimkit-learnings-log - Reject boolean confidence values (bool is a subclass of int in Python) - Use configured $LEARNINGS_FILE/$LOG_FILE in missing-file messages instead of hardcoded default paths - Tighten SKILL.md routing to "first word" match to prevent prefix collisions (e.g. /sysops logging) - Fix README dedup scope: (deployment, key), not just key - Add language tags to unlabeled fenced blocks in README (MD040) - Add tests: boolean confidence, missing source field
|
@coderabbitai approve |
✅ Actions performedComments resolved and changes approved. |
Closes #7
Summary
trimkit-learnings-log— appends a structured learning entry to~/.claude/sysops/learnings.jsonltrimkit-learnings-search— reads/deduplicates the learnings store;--humanfor formatted outputtrimkit-sysops-log-search— reads the audit log; replaces the ~80-line inline Python viewer in SKILL.md/sysops learningssub-command — view stored learnings per deploymentagents/sysops.md— agent loads learnings before checks/maintenance and writes them when relevantskills/sysops/SKILL.md— reduced from ~170 lines of inline Python to a ~30-line routing layer that calls the bin scriptsWhy the refactor
PR #20 had the viewer logic duplicated: once in
trimkit-learnings-search(JSONL) and again inline in SKILL.md (human-readable). Adding--humanflags to the bin scripts eliminates the duplication. The same pattern applied to the existing/sysops logviewer — extracted intotrimkit-sysops-log-search.Test plan
npx bats tests/bin/)/sysops logshows formatted audit entries or "no log found"/sysops log Pulse --last 5filters and limits correctly/sysops learningsshows formatted learnings or "no learnings stored"/sysops learnings Pulsefilters to one deploymenttrimkit-learnings-search(no--human) still outputs JSONL — agent integration unbrokenSummary by CodeRabbit
Release Notes
New Features
/sysops learningscommand to query stored deployment knowledge/sysops logcommand with improved filtering and formatting optionsDocumentation