Skip to content

feat: per-deployment learnings persistence for sysops#30

Merged
josephfung merged 13 commits into
mainfrom
feat/slim-skill-md
May 12, 2026
Merged

feat: per-deployment learnings persistence for sysops#30
josephfung merged 13 commits into
mainfrom
feat/slim-skill-md

Conversation

@josephfung

@josephfung josephfung commented May 12, 2026

Copy link
Copy Markdown
Owner

Closes #7

Summary

  • Adds per-deployment learnings persistence so the sysops agent can remember server quirks between sessions
  • trimkit-learnings-log — appends a structured learning entry to ~/.claude/sysops/learnings.jsonl
  • trimkit-learnings-search — reads/deduplicates the learnings store; --human for formatted output
  • trimkit-sysops-log-search — reads the audit log; replaces the ~80-line inline Python viewer in SKILL.md
  • /sysops learnings sub-command — view stored learnings per deployment
  • agents/sysops.md — agent loads learnings before checks/maintenance and writes them when relevant
  • skills/sysops/SKILL.md — reduced from ~170 lines of inline Python to a ~30-line routing layer that calls the bin scripts

Why the refactor

PR #20 had the viewer logic duplicated: once in trimkit-learnings-search (JSONL) and again inline in SKILL.md (human-readable). Adding --human flags to the bin scripts eliminates the duplication. The same pattern applied to the existing /sysops log viewer — extracted into trimkit-sysops-log-search.

Test plan

  • 113 BATS tests all pass (npx bats tests/bin/)
  • /sysops log shows formatted audit entries or "no log found"
  • /sysops log Pulse --last 5 filters and limits correctly
  • /sysops learnings shows formatted learnings or "no learnings stored"
  • /sysops learnings Pulse filters to one deployment
  • trimkit-learnings-search (no --human) still outputs JSONL — agent integration unbroken

Summary by CodeRabbit

Release Notes

  • New Features

    • Added deployment learnings storage to capture and retrieve insights from past operations
    • Introduced /sysops learnings command to query stored deployment knowledge
    • Enhanced /sysops log command with improved filtering and formatting options
    • Sysops agent now automatically loads relevant learnings before handling deployments and persists new insights from operations
  • Documentation

    • Expanded guides for log search and learnings features with usage examples and configuration options

Review Change Stack

josephfung added 12 commits May 12, 2026 12:37
- trimkit-learnings-log: reads JSON from stdin, injects ts, appends to
  ~/.claude/sysops/learnings.jsonl (same atomic append pattern as
  trimkit-sysops-log)
- trimkit-learnings-search: reads learnings.jsonl, deduplicates by key
  (latest entry wins), filters by --deployment, outputs JSONL to stdout
- Both scripts support TRIMKIT_SYSOPS_LEARNINGS_DIR / _FILE env overrides
  for test isolation
- 22 bats tests covering appending, dedup, filtering, edge cases, and
  concurrent writes
At the start of processing each deployment, the agent now calls
trimkit-learnings-search to surface stored quirks and known-safe
items before running checks or maintenance.

After status checks and maintenance, the agent writes a learning via
trimkit-learnings-log when it observes something worth persisting:
known-safe containers, post-reboot restart quirks, procedure deviations,
and similar server-specific context.

Learnings are written silently (no user prompt) and noted inline in
the report. Loading/writing are both guarded with `command -v` so
the agent degrades gracefully when trimkit is not installed.
skills/sysops/SKILL.md:
- Add learnings branch: /sysops learnings [Deployment]
- Read-only viewer with inline Python3; deduplicates by key (latest
  wins) and formats each entry with type, confidence, source, and
  recorded timestamp — consistent with the /sysops log pattern

sysops/README.md:
- Document the learnings store: location, full schema, example entry,
  /sysops learnings command syntax
- Document trimkit-learnings-log and trimkit-learnings-search scripts
  and their test env-var overrides
1. Cross-deployment key collision: dedup in trimkit-learnings-search and
   the /sysops learnings skill viewer now uses (deployment, key) tuple
   instead of key alone. Previously, a learning for Curia with key
   "shared-key" would suppress the Pulse learning with the same key.

2. trimkit-learnings-search --deployment without a value: silently
   returned all entries instead of erroring. Now validates $# before
   accepting the argument and exits 1 with a clear message.

3. trimkit-learnings-log missing key validation: an entry with an empty
   or missing "key" field is now rejected with exit 1. The key is the
   dedup anchor; an empty key collapses all keyless entries into one
   bucket and corrupts search results.

Also adds 5 new bats tests covering these cases (67 total, all passing).
agents/sysops.md:
- Load-learnings sections (status check + maintenance): add handling
  for non-zero exit from trimkit-learnings-search ("learnings
  unavailable" note); surface stderr warnings under Known quirks heading
- Write-learnings Call 2: distinguish between "tool not on PATH"
  (skip silently, omit note) and "tool found but failed" ("learning
  write failed: <key>" in report) — mirrors the audit log pattern
- Temp file: use session-scoped path
  /tmp/trimkit-sysops-learning-<SESSION>.json to prevent parallel
  invocation collisions

bin/trimkit-learnings-log:
- Validate deployment, type, and insight fields in addition to key
- Reject unknown type values (must be quirk|known-safe|procedure|warning)

tests: add 3 new bats tests for the new validations (70 total, all passing)
- bin/trimkit-learnings-search: fix header — dedup is by (deployment, key)
  pair, not bare key; missing-file case exits 0 with no output (no stderr
  message was ever written — the comment was wrong)
- bin/trimkit-learnings-log: fix header — document correct dedup scope
- sysops/README.md: fix trimkit-learnings-search description to say
  (deployment, key) pair
- skills/sysops/SKILL.md: fix error handlers to print to stderr (not
  stdout) so the agent correctly distinguishes errors from learnings output
- tests: add 4 new bats tests — missing insight validation, empty-string
  --deployment, corrupt JSONL skipped with valid entries still surfaced,
  corrupt JSONL emits a stderr warning (74 total, all passing)
Adds human-readable formatted output mode alongside the existing JSONL
output. When --human is passed, the script outputs formatted text with
deployment, type, key, confidence, source, and timestamp — plus insight
on a separate line. This moves the viewer logic out of SKILL.md and
into the testable bin script.
Extracts the audit log reading/formatting logic from SKILL.md into a
standalone, testable bin script. Supports --deployment filter,
--last N limit, and --human flag for formatted output. JSONL output
by default for machine consumption.
Replaces inline Python viewers for /sysops log and /sysops learnings
with calls to the bin scripts (trimkit-sysops-log-search --human and
trimkit-learnings-search --human). SKILL.md is now a thin routing
layer that parses arguments and delegates to the appropriate script.
Adds 7 tests for trimkit-learnings-search --human flag covering
formatted output, filtering, empty/missing file messages, dedup,
and corrupt line warnings. Creates 29-test suite for the new
trimkit-sysops-log-search covering JSONL output, --deployment filter,
--last limit, --human formatting, error handling, and edge cases.
Adds trimkit-sysops-log-search section to the sysops README and
updates the trimkit-learnings-search section to show --human usage.
- trimkit-learnings-log: wrap json.load() in try-except to emit a clean
  error message on invalid JSON instead of a raw Python traceback
- trimkit-learnings-log: validate confidence field (must be a number 0.0-1.0)
- agents/sysops.md: note learnings write skip (tool not on PATH) instead
  of silently omitting the note, aligning with the audit log pattern
- SKILL.md: add error handling guidance when bin scripts exit non-zero
- Remove unused `os` import from Python blocks in search scripts
- Add CHANGELOG entries for the new features
- Add tests for invalid JSON, missing confidence, and out-of-range confidence
@josephfung

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This pull request implements persistent memory for the sysops agent: an append-only JSONL store at ~/.claude/sysops/learnings.jsonl that captures deployment-specific quirks, procedures, and warnings. Three new shell utilities (trimkit-learnings-log, trimkit-learnings-search, trimkit-sysops-log-search) handle writing and querying entries. The sysops playbook now loads relevant learnings before starting status checks and maintenance, then writes newly discovered patterns after completion. User-facing /sysops log and /sysops learnings commands route through the skill to those utilities. Tests cover validation, deduplication, filtering, corruption handling, and concurrent writes across the entire suite.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary feature: per-deployment learnings persistence for sysops, matching the core changeset.
Linked Issues check ✅ Passed The PR comprehensively implements all coding requirements from issue #7: learnings JSONL storage, trimkit-learnings-log/search scripts, agent integration, /sysops learnings command, and deduplication logic.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #7 objectives: learnings scripts, agent integration, skill routing, documentation, and comprehensive BATS test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bin/trimkit-learnings-log`:
- Around line 57-60: Validation currently checks only ('deployment', 'key',
'type', 'insight') so the required "source" field is skipped; update the field
list in the validation loop in bin/trimkit-learnings-log (the for field in (...)
block that calls obj.get(field)) to include "source" so the script exits with an
error if obj.get('source') is missing or empty, maintaining the same error
message format for consistency.
- Around line 66-67: The validation wrongly accepts booleans because
isinstance(True, (int, float)) is True; update the conditional that checks conf
(the existing if conf is None or not isinstance(conf, (int, float)) or not (0.0
<= conf <= 1.0):) to explicitly reject bools (e.g., ensure conf is a number and
not an instance of bool) before checking the 0.0–1.0 range so that True/False do
not pass validation; keep the existing error message behavior when the check
fails.

In `@bin/trimkit-learnings-search`:
- Around line 63-64: Update the two echo messages in the
bin/trimkit-learnings-search script so they reference the configured variable
TRIMKIT_SYSOPS_LEARNINGS_FILE instead of the hardcoded
"~/.claude/sysops/learnings.jsonl"; locate the conditional/usage around the
existing echo lines and replace the literal path text with the variable (or an
evaluated/display-friendly form of it) so users see the actual file path the
script is using.

In `@bin/trimkit-sysops-log-search`:
- Around line 78-79: The hardcoded message prints "~/.claude/sysops/audit.jsonl"
even when TRIMKIT_SYSOPS_LOG_FILE is set; update the script to determine the
effective path (e.g. set a LOG_FILE variable from TRIMKIT_SYSOPS_LOG_FILE with a
fallback to "$HOME/.claude/sysops/audit.jsonl") and use that LOG_FILE in the
echo lines instead of the literal string so the message reflects any env
override; ensure tilde expansion by using $HOME rather than "~" when building
the fallback.

In `@skills/sysops/SKILL.md`:
- Line 8: Change the ambiguous "starts with `log` or `learnings`" routing rule
to explicitly state "if the first word of the argument is `log` or `learnings`"
so routing uses a first-word (token) match rather than a string prefix; update
the SKILL.md text around the /sysops routing rule and any related examples to
mention matching the first word (or using a word-boundary/token check) when
deciding not to delegate to the sysops subagent.

In `@sysops/README.md`:
- Around line 84-86: Add language identifiers to the unlabeled fenced code
blocks to satisfy markdownlint MD040: change the block that contains
"~/.claude/sysops/learnings.jsonl" to use a text fence (```text) and change the
command block that lists "/sysops learnings" examples to use a bash shell fence
(```bash); update both occurrences (the single-path block around
"~/.claude/sysops/learnings.jsonl" and the multi-line commands block around
"/sysops learnings ...") so each opening fence includes the appropriate language
label.
- Line 80: The README's Learnings overview incorrectly states deduplication is
by `key`; update the text to state that deduplication is done per (deployment,
key) so entries with the same `key` in different `deployment`s are kept
separately; reference the Learnings description and the terms `key` and
`deployment` and change the sentence to explicitly say "deduplicated by
(deployment, key): the latest entry for a given key within a deployment
supersedes earlier ones."
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 370188ee-367b-486a-af09-e3c0401a6fee

📥 Commits

Reviewing files that changed from the base of the PR and between a5db8eb and 9b85fc8.

📒 Files selected for processing (10)
  • CHANGELOG.md
  • agents/sysops.md
  • bin/trimkit-learnings-log
  • bin/trimkit-learnings-search
  • bin/trimkit-sysops-log-search
  • skills/sysops/SKILL.md
  • sysops/README.md
  • tests/bin/trimkit-learnings-log.bats
  • tests/bin/trimkit-learnings-search.bats
  • tests/bin/trimkit-sysops-log-search.bats

Comment thread bin/trimkit-learnings-log Outdated
Comment thread bin/trimkit-learnings-log Outdated
Comment thread bin/trimkit-learnings-search Outdated
Comment thread bin/trimkit-sysops-log-search Outdated
Comment thread skills/sysops/SKILL.md Outdated
Comment thread sysops/README.md Outdated
Comment thread sysops/README.md Outdated
- Validate 'source' field as required in trimkit-learnings-log
- Reject boolean confidence values (bool is a subclass of int in Python)
- Use configured $LEARNINGS_FILE/$LOG_FILE in missing-file messages
  instead of hardcoded default paths
- Tighten SKILL.md routing to "first word" match to prevent prefix
  collisions (e.g. /sysops logging)
- Fix README dedup scope: (deployment, key), not just key
- Add language tags to unlabeled fenced blocks in README (MD040)
- Add tests: boolean confidence, missing source field
@josephfung

Copy link
Copy Markdown
Owner Author

@coderabbitai approve

@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
✅ Actions performed

Comments resolved and changes approved.

@josephfung josephfung merged commit 8c739c0 into main May 12, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: per-project learnings persistence for sysops

1 participant