diff --git a/CHANGELOG.md b/CHANGELOG.md index d415c40..8d2495a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,14 @@ All notable changes to TrimKit are documented here. ## [0.5.1] - Unreleased +### Sysops learnings +- Per-deployment learnings persistence — the sysops agent writes a structured entry to `~/.claude/sysops/learnings.jsonl` whenever it discovers a server quirk, known-safe container, procedure deviation, or other deployment-specific context worth remembering +- `trimkit-learnings-log` — bin script that appends a learning entry to the store; reads JSON from stdin, injects `ts`, validates required fields and type enum, and writes atomically +- `trimkit-learnings-search` — bin script that reads the store, deduplicates by `(deployment, key)` pair (latest entry per pair wins), filters by deployment, and outputs JSONL or formatted text (`--human`) +- `trimkit-sysops-log-search` — bin script extracted from SKILL.md that reads `audit.jsonl`, filters by deployment and entry count (`--last N`), and outputs JSONL or formatted text (`--human`) +- `/sysops learnings` sub-command — view stored learnings for all deployments or a specific one +- SKILL.md refactored from ~170 lines of inline Python to a ~30-line routing layer that delegates to the bin scripts + ### CLAUDE.md guidance - Pull before branching — injected instruction to run `git pull --ff-only` before creating worktrees or branches - Issue tracker hygiene — injected instructions to apply pre-existing labels and include acceptance criteria when creating or editing issues diff --git a/agents/sysops.md b/agents/sysops.md index de3cd0e..5c1353a 100644 --- a/agents/sysops.md +++ b/agents/sysops.md @@ -33,6 +33,8 @@ basename "$PWD" ``` Note the output (e.g. `curia`). This is your PROJECT. +Before working on each deployment, load its stored learnings (see each section below). At the end of each deployment's work, write any learnings you discovered. + ## Intent detection Determine what the user wants based on how they invoked you: @@ -46,6 +48,20 @@ For maintenance, determine scope: ## Status check +### Load deployment learnings + +Before running checks on each deployment, query its stored learnings. Use the literal deployment name (e.g. `Pulse`): + +```bash +if command -v trimkit-learnings-search > /dev/null 2>&1; then trimkit-learnings-search --deployment ""; fi +``` + +If any learnings are returned, surface them briefly before the status report under a **Known quirks** heading, so they inform your interpretation of results. If the command produces output on stderr, include it as a warning under that heading. + +If `trimkit-learnings-search` is not on PATH, skip silently. If it is found but exits non-zero, note `(learnings unavailable — error loading store)` before the status report and continue; do not treat this as fatal. + +### Checks + For each deployment, run these commands via SSH and collect results: ```bash @@ -153,10 +169,66 @@ rm -f /tmp/trimkit-sysops-entry.json If `trimkit-sysops-log` is not found (trimkit not installed), skip logging and note it in the report output: `(audit log skipped — trimkit-sysops-log not on PATH)`. Non-fatal; must not block the report. +### Write learnings (if applicable) + +If you observed something worth persisting for future sessions, write a learning entry. This does not require prompting the user — just write it and note `(learning recorded: )` at the bottom of the report. Only write learnings that would be useful context in a future session; skip routine, uneventful checks. + +Triggers that warrant a learning: +- User confirmed an unregistered container as known-safe (A) → `type: "known-safe"`, `confidence: 1.0` +- A container was consistently unhealthy in a way that appears server-specific → `type: "quirk"`, `confidence: 0.8` +- A persistent server condition (unusual disk state, always-high memory, etc.) that is expected and normal for this server → `type: "known-safe"`, `confidence: 0.9` + +Use a stable kebab-case `key` that describes the observation (e.g. `redis-cache-known-safe`, `high-mem-expected-on-pulse`). + +**Call 1** — build the JSON entry. Substitute all placeholders with actual values, including the **literal** SESSION UUID you captured above (not a shell variable — it does not persist across Bash calls). The session UUID in the temp file path prevents collision with concurrent agent invocations: +```bash +python3 -c " +import json, sys +print(json.dumps({ + 'deployment': sys.argv[1], + 'key': sys.argv[2], + 'type': sys.argv[3], + 'insight': sys.argv[4], + 'confidence': float(sys.argv[5]), + 'source': 'observed' +})) +" \ + "" \ + "" \ + "" \ + "" \ + "" \ + > /tmp/trimkit-sysops-learning-.json +``` + +**Call 2** — append to the learnings store. Only run if Call 1 exited successfully. Substitute the same literal SESSION UUID in the temp file path: +```bash +if command -v trimkit-learnings-log > /dev/null 2>&1; then trimkit-learnings-log < /tmp/trimkit-sysops-learning-.json; fi +``` +- If `trimkit-learnings-log` is not on PATH, note `(learning write skipped — trimkit-learnings-log not on PATH)` in the report instead of `(learning recorded: )`. +- If `trimkit-learnings-log` is found but returns a non-zero exit code, note `(learning write failed: )` in the report instead of `(learning recorded: )`. + +**Call 3** — clean up. Substitute the same literal SESSION UUID: +```bash +rm -f /tmp/trimkit-sysops-learning-.json +``` + ## Maintenance Run in sequence for each targeted deployment. Do not proceed to the next deployment until the current one is fully complete (or has failed). +### 0. Load deployment learnings + +Before running maintenance on each deployment, query its stored learnings. Use the literal deployment name: + +```bash +if command -v trimkit-learnings-search > /dev/null 2>&1; then trimkit-learnings-search --deployment ""; fi +``` + +If any learnings are returned, surface them under a **Known quirks** heading before beginning work — they may describe procedures or quirks that affect how you should run maintenance. If the command produces output on stderr, include it as a warning under that heading. + +If `trimkit-learnings-search` is not on PATH, skip silently. If it is found but exits non-zero, note `(learnings unavailable — error loading store)` before the maintenance output and continue; do not treat this as fatal. + ### 1. Apply updates ```bash @@ -202,7 +274,20 @@ Containers after: for each Notes: ``` -### 6. Write audit log entry +### 6. Write learnings (if applicable) + +If you discovered something worth persisting for future sessions, write a learning entry — silently, without prompting the user. Note `(learning recorded: )` in the maintenance report for that deployment. Skip if nothing notable was observed. + +Triggers that warrant a learning during maintenance: +- A container did not auto-recover after reboot and required manual restart → `type: "quirk"`, `confidence: 0.9` +- A package upgrade caused unexpected service behaviour → `type: "warning"`, `confidence: 0.8` +- A procedure deviated from the standard playbook in a way that should be repeated next time → `type: "procedure"`, `confidence: 0.9` +- User confirmed an unregistered container as known-safe (A) → `type: "known-safe"`, `confidence: 1.0` +- Any other server-specific quirk that will save time in a future session → `type: "quirk"`, `confidence: 0.7–0.9` + +Use the same 3-call write pattern as in the status check's learnings section: same JSON schema, same `trimkit-learnings-log` invocation, same session-scoped temp file path (`/tmp/trimkit-sysops-learning-.json`), same cleanup, and same Call 2 failure handling (`(learning write failed: )` on non-zero exit, `(learning write skipped — trimkit-learnings-log not on PATH)` if tool not on PATH). + +### 7. Write audit log entry After completing maintenance on a deployment (success or failure), write an audit entry. Run these three commands in sequence: diff --git a/bin/trimkit-learnings-log b/bin/trimkit-learnings-log new file mode 100755 index 0000000..91ef4cc --- /dev/null +++ b/bin/trimkit-learnings-log @@ -0,0 +1,82 @@ +#!/usr/bin/env bash +# trimkit-learnings-log — Append a sysops learning entry to the persistent JSONL store. +# +# Reads a JSON object from stdin, injects a "ts" field (current UTC timestamp), +# and appends the entry as a single line to the learnings log. +# +# Required fields in the JSON input: +# deployment string deployment name (e.g. "Pulse") +# key string stable kebab-case identifier for this learning (e.g. "caddy-restart-required-after-upgrade") +# type string one of: quirk, known-safe, procedure, warning +# insight string human-readable description of what was learned +# confidence number 0.0–1.0; how confident the agent is in this learning +# source string how the learning was discovered (e.g. "observed", "inferred") +# +# Deduplication is handled at read time by trimkit-learnings-search: the latest +# entry for a given (deployment, key) pair wins. The same key on different +# deployments is kept as independent entries. This script never modifies existing entries. +# +# The single printf write is atomic on Linux/macOS for payloads under PIPE_BUF +# (~4KB), which comfortably covers the learnings entry schema. +# +# Environment overrides (for testing): +# TRIMKIT_SYSOPS_LEARNINGS_DIR override log directory (default: ~/.claude/sysops) +# TRIMKIT_SYSOPS_LEARNINGS_FILE override log file path (default: $DIR/learnings.jsonl) +# +# Usage: +# echo '{"deployment":"Pulse","key":"caddy-restart","type":"quirk",...}' | trimkit-learnings-log +set -euo pipefail + +LEARNINGS_DIR="${TRIMKIT_SYSOPS_LEARNINGS_DIR:-${HOME}/.claude/sysops}" +LEARNINGS_FILE="${TRIMKIT_SYSOPS_LEARNINGS_FILE:-${LEARNINGS_DIR}/learnings.jsonl}" + +# Read JSON from stdin +json="$(cat)" + +if [ -z "$json" ]; then + echo "trimkit-learnings-log: error: no JSON received on stdin" >&2 + exit 1 +fi + +# Current UTC timestamp +ts="$(date -u +%Y-%m-%dT%H:%M:%SZ)" + +# Inject ts and validate required fields — pipe json into python3 via stdin to +# avoid shell-argument quoting issues and ARG_MAX limits with large payloads. +# Validated fields: +# key — dedup anchor; an empty key corrupts the (deployment, key) dedup bucket +# deployment — scopes dedup; an empty value silently files the entry under no deployment +# type — must be a known value to keep the store queryable and meaningful +entry="$(printf '%s' "$json" | python3 -c " +import json, sys +try: + obj = json.load(sys.stdin) +except (json.JSONDecodeError, ValueError) as e: + print(f'trimkit-learnings-log: error: invalid JSON on stdin: {e}', file=sys.stderr) + sys.exit(1) +for field in ('deployment', 'key', 'type', 'insight', 'source'): + if not obj.get(field): + print(f'trimkit-learnings-log: error: \"{field}\" field is required and must be non-empty', file=sys.stderr) + sys.exit(1) +valid_types = {'quirk', 'known-safe', 'procedure', 'warning'} +if obj['type'] not in valid_types: + print(f'trimkit-learnings-log: error: \"type\" must be one of {sorted(valid_types)!r}, got {obj[\"type\"]!r}', file=sys.stderr) + sys.exit(1) +conf = obj.get('confidence') +if ( + conf is None + or isinstance(conf, bool) + or not isinstance(conf, (int, float)) + or not (0.0 <= conf <= 1.0) +): + print(f'trimkit-learnings-log: error: \"confidence\" must be a number between 0.0 and 1.0, got {conf!r}', file=sys.stderr) + sys.exit(1) +obj['ts'] = sys.argv[1] +print(json.dumps(obj, separators=(',', ':'))) +" "$ts")" + +# Create log directory if needed +mkdir -p "$LEARNINGS_DIR" || { echo "trimkit-learnings-log: error: cannot create log directory '$LEARNINGS_DIR'" >&2; exit 1; } + +# Append — atomic for single-line writes under PIPE_BUF +printf '%s\n' "$entry" >> "$LEARNINGS_FILE" || { echo "trimkit-learnings-log: error: cannot write to '$LEARNINGS_FILE'" >&2; exit 1; } diff --git a/bin/trimkit-learnings-search b/bin/trimkit-learnings-search new file mode 100755 index 0000000..44150f7 --- /dev/null +++ b/bin/trimkit-learnings-search @@ -0,0 +1,144 @@ +#!/usr/bin/env bash +# trimkit-learnings-search — Query the sysops learnings store. +# +# Reads ~/.claude/sysops/learnings.jsonl, deduplicates entries by (deployment, key) +# pair (latest entry per pair wins), optionally filters by deployment, and outputs +# the surviving entries to stdout. +# +# Usage: +# trimkit-learnings-search # JSONL output (deduplicated) +# trimkit-learnings-search --deployment Pulse # filter to Pulse deployment +# trimkit-learnings-search --human # human-readable formatted output +# trimkit-learnings-search --human --deployment Pulse +# +# Arguments: +# --deployment Case-insensitive deployment filter (optional) +# --human Output human-readable formatted text instead of JSONL +# +# Output: +# Default: One JSON object per line (JSONL), latest-wins deduplicated. +# --human: Formatted text with deployment, type, key, confidence, source, timestamp, +# and insight for each entry. Shows "No learnings found" when empty. +# Exits 0 with no output (JSONL) or a message (--human) if no matching entries exist +# or if the store doesn't exist yet. +# +# Environment overrides (for testing): +# TRIMKIT_SYSOPS_LEARNINGS_DIR override log directory (default: ~/.claude/sysops) +# TRIMKIT_SYSOPS_LEARNINGS_FILE override log file path (default: $DIR/learnings.jsonl) +set -euo pipefail + +LEARNINGS_DIR="${TRIMKIT_SYSOPS_LEARNINGS_DIR:-${HOME}/.claude/sysops}" +LEARNINGS_FILE="${TRIMKIT_SYSOPS_LEARNINGS_FILE:-${LEARNINGS_DIR}/learnings.jsonl}" + +# Parse arguments +deployment_filter="" +human_output="false" +while [ $# -gt 0 ]; do + case "$1" in + --deployment) + # Require a non-empty value following the flag + if [ $# -lt 2 ] || [ -z "${2:-}" ]; then + echo "trimkit-learnings-search: error: --deployment requires an argument" >&2 + echo "Usage: trimkit-learnings-search [--deployment ] [--human]" >&2 + exit 1 + fi + shift + deployment_filter="$1" + ;; + --human) + human_output="true" + ;; + *) + echo "trimkit-learnings-search: error: unknown argument '$1'" >&2 + echo "Usage: trimkit-learnings-search [--deployment ] [--human]" >&2 + exit 1 + ;; + esac + shift +done + +if [ ! -f "$LEARNINGS_FILE" ]; then + # Not an error — the store simply hasn't been written to yet + if [ "$human_output" = "true" ]; then + echo "No learnings stored yet at $LEARNINGS_FILE" + echo "Learnings are written by the sysops agent when it discovers server quirks." + fi + exit 0 +fi + +# Read, deduplicate by (deployment, key) pair (latest entry wins), filter by +# deployment, and output surviving entries (JSONL or human-readable). +# Dedup strategy: scan all lines in order; for each (deployment, key) pair, keep +# overwriting with the latest seen entry. Scoping dedup to deployment prevents a +# key written for one deployment from suppressing the same key on another. +python3 -c " +import json, sys + +learnings_file = sys.argv[1] +deployment_filter = sys.argv[2] # empty string means no filter +human_output = sys.argv[3] == 'true' + +seen = {} # (deployment, key) -> entry; latest entry per pair wins +order = [] # insertion-order list of (deployment, key) pairs, first-seen only + +corrupt_count = 0 +try: + with open(learnings_file) as f: + for line in f: + line = line.strip() + if not line: + continue + try: + obj = json.loads(line) + except json.JSONDecodeError: + corrupt_count += 1 + continue + dedup_key = (obj.get('deployment', ''), obj.get('key', '')) + if dedup_key not in seen: + order.append(dedup_key) + seen[dedup_key] = obj +except PermissionError: + print(f'trimkit-learnings-search: error: cannot read {learnings_file} (permission denied)', file=sys.stderr) + sys.exit(1) +except OSError as e: + print(f'trimkit-learnings-search: error: cannot read learnings file: {e}', file=sys.stderr) + sys.exit(1) + +if corrupt_count: + print(f'trimkit-learnings-search: warning: {corrupt_count} corrupt line(s) skipped', file=sys.stderr) + +# Collect filtered entries in first-seen order +entries = [] +for dedup_key in order: + entry = seen[dedup_key] + if deployment_filter and entry.get('deployment', '').lower() != deployment_filter.lower(): + continue + entries.append(entry) + +if human_output: + if not entries: + msg = 'No learnings found' + if deployment_filter: + msg += f' for deployment {deployment_filter!r}' + print(msg + '.') + if corrupt_count: + print(f'Warning: {corrupt_count} corrupt line(s) skipped in learnings store.') + sys.exit(0) + for e in entries: + ts = e.get('ts', 'unknown') + dep = e.get('deployment', 'unknown') + key = e.get('key', '?') + ltype = e.get('type', '?') + insight = e.get('insight', '') + confidence = e.get('confidence', '?') + source = e.get('source', '?') + print(f'{dep} [{ltype}] {key} (confidence: {confidence}, source: {source}, recorded: {ts})') + if insight: + print(f' {insight}') + print() + if corrupt_count: + print(f'Warning: {corrupt_count} corrupt line(s) skipped in learnings store.') +else: + for e in entries: + print(json.dumps(e, separators=(',', ':'))) +" "$LEARNINGS_FILE" "$deployment_filter" "$human_output" diff --git a/bin/trimkit-sysops-log-search b/bin/trimkit-sysops-log-search new file mode 100755 index 0000000..8a67207 --- /dev/null +++ b/bin/trimkit-sysops-log-search @@ -0,0 +1,166 @@ +#!/usr/bin/env bash +# trimkit-sysops-log-search — Query the sysops audit log. +# +# Reads ~/.claude/sysops/audit.jsonl, optionally filters by deployment, limits +# to the most recent N entries, and outputs the results to stdout. +# +# Usage: +# trimkit-sysops-log-search # JSONL, last 10 +# trimkit-sysops-log-search --deployment Pulse # filter to Pulse +# trimkit-sysops-log-search --last 25 # last 25 entries +# trimkit-sysops-log-search --human # human-readable output +# trimkit-sysops-log-search --human --deployment Pulse --last 5 +# +# Arguments: +# --deployment Case-insensitive deployment filter (optional) +# --last Limit to N most recent entries (default: 10) +# --human Output human-readable formatted text instead of JSONL +# +# Output: +# Default: One JSON object per line (JSONL). +# --human: Formatted text with timestamp, deployment, env, action, containers, +# unregistered containers, and notes. Shows "No audit log found" when +# the log file doesn't exist, or "No entries found" when empty/filtered. +# Exits 0 with no output (JSONL) or a message (--human) if no matching entries +# exist or if the log doesn't exist yet. +# +# Environment overrides (for testing): +# TRIMKIT_SYSOPS_LOG_DIR override log directory (default: ~/.claude/sysops) +# TRIMKIT_SYSOPS_LOG_FILE override log file path (default: $DIR/audit.jsonl) +set -euo pipefail + +LOG_DIR="${TRIMKIT_SYSOPS_LOG_DIR:-${HOME}/.claude/sysops}" +LOG_FILE="${TRIMKIT_SYSOPS_LOG_FILE:-${LOG_DIR}/audit.jsonl}" + +# Parse arguments +deployment_filter="" +limit="10" +human_output="false" +while [ $# -gt 0 ]; do + case "$1" in + --deployment) + if [ $# -lt 2 ] || [ -z "${2:-}" ]; then + echo "trimkit-sysops-log-search: error: --deployment requires an argument" >&2 + echo "Usage: trimkit-sysops-log-search [--deployment ] [--last ] [--human]" >&2 + exit 1 + fi + shift + deployment_filter="$1" + ;; + --last) + if [ $# -lt 2 ] || [ -z "${2:-}" ]; then + echo "trimkit-sysops-log-search: error: --last requires an argument" >&2 + echo "Usage: trimkit-sysops-log-search [--deployment ] [--last ] [--human]" >&2 + exit 1 + fi + shift + # Validate that the value is a positive integer + if ! printf '%s' "$1" | grep -qE '^[1-9][0-9]*$'; then + echo "trimkit-sysops-log-search: error: --last must be a positive integer, got '$1'" >&2 + exit 1 + fi + limit="$1" + ;; + --human) + human_output="true" + ;; + *) + echo "trimkit-sysops-log-search: error: unknown argument '$1'" >&2 + echo "Usage: trimkit-sysops-log-search [--deployment ] [--last ] [--human]" >&2 + exit 1 + ;; + esac + shift +done + +if [ ! -f "$LOG_FILE" ]; then + if [ "$human_output" = "true" ]; then + echo "No audit log found at $LOG_FILE" + echo "Logs are written after each /sysops status or /sysops update run." + fi + exit 0 +fi + +# Read audit log, filter by deployment, take last N entries, and output +# as JSONL or human-readable text. +python3 -c " +import json, sys + +log_file = sys.argv[1] +deployment_filter = sys.argv[2] # empty string means no filter +limit = int(sys.argv[3]) +human_output = sys.argv[4] == 'true' + +entries = [] +corrupt_count = 0 +try: + with open(log_file) as f: + for line in f: + line = line.strip() + if not line: + continue + try: + obj = json.loads(line) + except json.JSONDecodeError: + corrupt_count += 1 + continue + if deployment_filter and obj.get('deployment', '').lower() != deployment_filter.lower(): + continue + entries.append(obj) +except PermissionError: + print(f'trimkit-sysops-log-search: error: cannot read {log_file} (permission denied)', file=sys.stderr) + sys.exit(1) +except OSError as e: + print(f'trimkit-sysops-log-search: error: cannot read audit log: {e}', file=sys.stderr) + sys.exit(1) + +entries = entries[-limit:] + +if corrupt_count: + print(f'trimkit-sysops-log-search: warning: {corrupt_count} corrupt line(s) skipped', file=sys.stderr) + +if human_output: + if not entries: + msg = 'No entries found' + if deployment_filter: + msg += f' for deployment {deployment_filter!r}' + print(msg + '.') + if corrupt_count: + print(f'Warning: {corrupt_count} corrupt line(s) skipped in audit log.') + sys.exit(0) + + for e in entries: + ts = e.get('ts', 'unknown') + dep = e.get('deployment', 'unknown') + env = e.get('env', '?') + action = e.get('action', '?') + project = e.get('project', '?') + print(f'{ts} {dep} [{env}] {action} (project: {project})') + + if action == 'maintenance': + pkgs = e.get('packages_upgraded', 0) + reboot = e.get('reboot_performed', False) + dur = e.get('reboot_duration_s') + reboot_str = f'yes ({dur}s)' if reboot and dur else ('yes' if reboot else 'no') + print(f' Packages upgraded: {pkgs} \u00b7 Reboot: {reboot_str}') + + containers = e.get('containers', {}) + if containers: + parts = [n + (' \u2713' if s == 'healthy' else ' \u2717') for n, s in containers.items()] + print(' Containers: ' + ' '.join(parts)) + + unregistered = e.get('unregistered_containers', []) + unreg_str = ' '.join(u + ' \u26a0' for u in unregistered) if unregistered else 'none' + print(f' Unregistered: {unreg_str}') + + notes = e.get('notes', '') + if notes: + print(f' Notes: {notes}') + print() + + if corrupt_count: + print(f'Warning: {corrupt_count} corrupt line(s) skipped in audit log.') +else: + for e in entries: + print(json.dumps(e, separators=(',', ':'))) +" "$LOG_FILE" "$deployment_filter" "$limit" "$human_output" diff --git a/skills/sysops/SKILL.md b/skills/sysops/SKILL.md index 0f85de8..aec96bc 100644 --- a/skills/sysops/SKILL.md +++ b/skills/sysops/SKILL.md @@ -5,7 +5,7 @@ description: Invoke the sysops subagent for VPS maintenance. Use when the user r # Sysops -If the argument starts with `log`, handle it directly — do NOT delegate to the sysops subagent. +If the first word of the argument is `log` or `learnings`, handle it directly — do NOT delegate to the sysops subagent. ## /sysops log @@ -14,89 +14,36 @@ Parse the argument: - `/sysops log ` → show last 10 entries for that deployment (case-insensitive match) - `/sysops log --last ` → show last N entries for that deployment -Run this command, substituting `DEPLOYMENT_FILTER` (empty string `""` if no deployment was given) and `LIMIT` (default `10`): +Run this command, substituting `` (omit `--deployment` entirely if no deployment was given) and `` (default `10`): ```bash -DEPLOYMENT_FILTER="" LIMIT= python3 -c " -import json, os, sys - -log_file = os.path.expanduser('~/.claude/sysops/audit.jsonl') -deployment_filter = os.environ.get('DEPLOYMENT_FILTER', '') -limit = int(os.environ.get('LIMIT', '10')) - -if not os.path.exists(log_file): - print('No audit log found at ~/.claude/sysops/audit.jsonl') - print('Logs are written after each /sysops status or /sysops update run.') - sys.exit(0) - -entries = [] -corrupt_count = 0 -try: - with open(log_file) as f: - for line in f: - line = line.strip() - if not line: - continue - try: - obj = json.loads(line) - except json.JSONDecodeError: - corrupt_count += 1 - continue - if deployment_filter and obj.get('deployment', '').lower() != deployment_filter.lower(): - continue - entries.append(obj) -except PermissionError: - print(f'Error: cannot read audit log at {log_file} (permission denied).') - sys.exit(1) -except OSError as e: - print(f'Error: cannot read audit log: {e}') - sys.exit(1) - -entries = entries[-limit:] - -if not entries: - msg = 'No entries found' - if deployment_filter: - msg += f' for deployment {deployment_filter!r}' - print(msg + '.') - if corrupt_count: - print(f'Warning: {corrupt_count} corrupt line(s) skipped in audit log.') - sys.exit(0) - -for e in entries: - ts = e.get('ts', 'unknown') - dep = e.get('deployment', 'unknown') - env = e.get('env', '?') - action = e.get('action', '?') - project = e.get('project', '?') - print(f'{ts} {dep} [{env}] {action} (project: {project})') - - if action == 'maintenance': - pkgs = e.get('packages_upgraded', 0) - reboot = e.get('reboot_performed', False) - dur = e.get('reboot_duration_s') - reboot_str = f'yes ({dur}s)' if reboot and dur else ('yes' if reboot else 'no') - print(f' Packages upgraded: {pkgs} · Reboot: {reboot_str}') - - containers = e.get('containers', {}) - if containers: - parts = [n + (' ✓' if s == 'healthy' else ' ✗') for n, s in containers.items()] - print(' Containers: ' + ' '.join(parts)) - - unregistered = e.get('unregistered_containers', []) - unreg_str = ' '.join(u + ' ⚠' for u in unregistered) if unregistered else 'none' - print(f' Unregistered: {unreg_str}') - - notes = e.get('notes', '') - if notes: - print(f' Notes: {notes}') - print() - -if corrupt_count: - print(f'Warning: {corrupt_count} corrupt line(s) skipped in audit log.') -" +trimkit-sysops-log-search --human [--deployment ""] [--last ] ``` +If `trimkit-sysops-log-search` is not on PATH, tell the user: +> `trimkit-sysops-log-search` is not installed. Run `install.sh` from your trimkit directory to set it up. + +If `trimkit-sysops-log-search` is found but exits non-zero, show the user the error output and suggest checking file permissions on `~/.claude/sysops/audit.jsonl`. + +--- + +## /sysops learnings + +Parse the argument: +- `/sysops learnings` → show all stored learnings across all deployments (deduplicated) +- `/sysops learnings ` → show learnings for that deployment only (case-insensitive match) + +Run this command, substituting `` (omit `--deployment` entirely if no deployment was given): + +```bash +trimkit-learnings-search --human [--deployment ""] +``` + +If `trimkit-learnings-search` is not on PATH, tell the user: +> `trimkit-learnings-search` is not installed. Run `install.sh` from your trimkit directory to set it up. + +If `trimkit-learnings-search` is found but exits non-zero, show the user the error output and suggest checking file permissions on `~/.claude/sysops/learnings.jsonl`. + --- Otherwise, delegate to the `sysops` subagent using the Agent tool. diff --git a/sysops/README.md b/sysops/README.md index 84218b6..e8f8d50 100644 --- a/sysops/README.md +++ b/sysops/README.md @@ -58,3 +58,81 @@ Use the `/sysops log` slash command: Environment overrides for testing: - `TRIMKIT_SYSOPS_LOG_DIR` — override the log directory - `TRIMKIT_SYSOPS_LOG_FILE` — override the log file path + +## trimkit-sysops-log-search + +`bin/trimkit-sysops-log-search` reads the audit log, optionally filters by deployment, limits to the most recent N entries, and outputs the results. Used by the `/sysops log` slash command. + +```bash +trimkit-sysops-log-search # last 10 entries (JSONL) +trimkit-sysops-log-search --deployment Pulse # filter to Pulse +trimkit-sysops-log-search --last 25 # last 25 entries +trimkit-sysops-log-search --human # human-readable output +trimkit-sysops-log-search --human --deployment Pulse --last 5 +``` + +Environment overrides for testing: +- `TRIMKIT_SYSOPS_LOG_DIR` — override the log directory +- `TRIMKIT_SYSOPS_LOG_FILE` — override the log file path + +## Learnings + +The sysops agent writes a learning entry whenever it discovers something worth persisting for future sessions — server quirks, known-safe containers, procedure deviations, and similar deployment-specific context. Learnings are deduplicated by `(deployment, key)`: the latest entry for a given pair supersedes earlier ones. History is preserved in the file (append-only); only the latest entry per pair is surfaced at read time. + +### Learnings location + +```text +~/.claude/sysops/learnings.jsonl +``` + +### Schema + +Each line is a JSON object: + +| Field | Type | Description | +|--------------|-------------------|--------------------------------------------------------------------| +| `ts` | string (ISO 8601) | UTC timestamp when the entry was written | +| `deployment` | string | Deployment name (e.g. `Pulse`) | +| `key` | string | Stable kebab-case identifier for this learning (dedup key) | +| `type` | string | `quirk`, `known-safe`, `procedure`, or `warning` | +| `insight` | string | Human-readable description of what was learned | +| `confidence` | number (0.0–1.0) | How confident the agent is; low-confidence entries are tentative | +| `source` | string | How the learning was discovered (e.g. `observed`, `inferred`) | + +### Example entry + +```json +{"ts":"2026-04-10T12:00:00Z","deployment":"Pulse","key":"caddy-restart-required-after-upgrade","type":"quirk","insight":"Caddy container requires manual restart after apt upgrade — it does not auto-recover.","confidence":0.9,"source":"observed"} +``` + +## Viewing learnings + +Use the `/sysops learnings` slash command: + +```bash +/sysops learnings # all stored learnings (deduplicated) +/sysops learnings Pulse # learnings for Pulse only +``` + +## trimkit-learnings-log + +`bin/trimkit-learnings-log` appends a learning entry to the learnings store. It reads JSON from stdin, injects `ts`, and appends to the file. It is installed to `~/.trimkit/bin/` by `install.sh`. + +Environment overrides for testing: +- `TRIMKIT_SYSOPS_LEARNINGS_DIR` — override the learnings directory +- `TRIMKIT_SYSOPS_LEARNINGS_FILE` — override the learnings file path + +## trimkit-learnings-search + +`bin/trimkit-learnings-search` reads the learnings store, deduplicates by `(deployment, key)` pair (latest entry per pair wins), optionally filters by deployment, and outputs surviving entries. Used by the `/sysops learnings` slash command. + +```bash +trimkit-learnings-search # all learnings (JSONL) +trimkit-learnings-search --deployment Pulse # Pulse learnings only +trimkit-learnings-search --human # human-readable output +trimkit-learnings-search --human --deployment Pulse # filtered, human-readable +``` + +Environment overrides for testing: +- `TRIMKIT_SYSOPS_LEARNINGS_DIR` — override the learnings directory +- `TRIMKIT_SYSOPS_LEARNINGS_FILE` — override the learnings file path diff --git a/tests/bin/trimkit-learnings-log.bats b/tests/bin/trimkit-learnings-log.bats new file mode 100644 index 0000000..8727279 --- /dev/null +++ b/tests/bin/trimkit-learnings-log.bats @@ -0,0 +1,184 @@ +#!/usr/bin/env bats + +# tests/bin/trimkit-learnings-log.bats — tests for bin/trimkit-learnings-log + +setup() { + load '../test_helper/bats-support/load' + load '../test_helper/bats-assert/load' + SCRIPT="$BATS_TEST_DIRNAME/../../bin/trimkit-learnings-log" + TMPDIR_CUSTOM="$(mktemp -d)" + export TRIMKIT_SYSOPS_LEARNINGS_DIR="$TMPDIR_CUSTOM" + export TRIMKIT_SYSOPS_LEARNINGS_FILE="$TMPDIR_CUSTOM/learnings.jsonl" +} + +teardown() { + rm -rf "$TMPDIR_CUSTOM" +} + +SAMPLE='{"deployment":"Pulse","key":"caddy-restart-required-after-upgrade","type":"quirk","insight":"Caddy container requires manual restart after apt upgrade.","confidence":0.9,"source":"observed"}' + +@test "script exists and is executable" { + [ -x "$SCRIPT" ] +} + +@test "creates log directory if it does not exist" { + rm -rf "$TMPDIR_CUSTOM" + echo "$SAMPLE" | bash "$SCRIPT" + [ -d "$TMPDIR_CUSTOM" ] +} + +@test "appends one line per call" { + echo "$SAMPLE" | bash "$SCRIPT" + count="$(wc -l < "$TRIMKIT_SYSOPS_LEARNINGS_FILE" | tr -d ' ')" + assert_equal "$count" "1" +} + +@test "two calls produce two lines" { + echo "$SAMPLE" | bash "$SCRIPT" + echo "$SAMPLE" | bash "$SCRIPT" + count="$(wc -l < "$TRIMKIT_SYSOPS_LEARNINGS_FILE" | tr -d ' ')" + assert_equal "$count" "2" +} + +@test "each line is valid JSON" { + echo "$SAMPLE" | bash "$SCRIPT" + python3 -c " +import json +with open('$TRIMKIT_SYSOPS_LEARNINGS_FILE') as f: + for line in f: + json.loads(line) # raises ValueError if invalid +" +} + +@test "injects ts field in ISO 8601 UTC format" { + echo "$SAMPLE" | bash "$SCRIPT" + python3 -c " +import json, re +with open('$TRIMKIT_SYSOPS_LEARNINGS_FILE') as f: + obj = json.loads(f.read().strip()) +assert 'ts' in obj, 'ts field missing' +assert re.match(r'^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z$', obj['ts']), f'bad ts format: {obj[\"ts\"]}' +" +} + +@test "preserves all input fields" { + input='{"deployment":"Pulse","key":"caddy-restart-required-after-upgrade","type":"quirk","insight":"Caddy needs restart.","confidence":0.9,"source":"observed"}' + echo "$input" | bash "$SCRIPT" + python3 -c " +import json +with open('$TRIMKIT_SYSOPS_LEARNINGS_FILE') as f: + obj = json.loads(f.read().strip()) +assert obj['deployment'] == 'Pulse' +assert obj['key'] == 'caddy-restart-required-after-upgrade' +assert obj['type'] == 'quirk' +assert obj['insight'] == 'Caddy needs restart.' +assert obj['confidence'] == 0.9 +assert obj['source'] == 'observed' +" +} + +@test "exits 1 with error message when stdin is empty" { + run bash -c "echo '' | bash '$SCRIPT'" + assert_failure + assert_output --partial "error: no JSON received on stdin" +} + +@test "exits 0 on success" { + run bash -c "echo '$SAMPLE' | bash '$SCRIPT'" + assert_success +} + +@test "exits 1 with error message when insight field is missing" { + run bash -c "echo '{\"deployment\":\"Pulse\",\"key\":\"my-key\",\"type\":\"quirk\",\"confidence\":0.9,\"source\":\"observed\"}' | bash '$SCRIPT'" + assert_failure + assert_output --partial "error:" +} + +@test "exits 1 with error message when key field is missing" { + run bash -c "echo '{\"deployment\":\"Pulse\",\"type\":\"quirk\",\"insight\":\"x\",\"confidence\":0.9,\"source\":\"observed\"}' | bash '$SCRIPT'" + assert_failure + assert_output --partial "error:" +} + +@test "exits 1 with error message when key field is empty string" { + run bash -c "echo '{\"deployment\":\"Pulse\",\"key\":\"\",\"type\":\"quirk\",\"insight\":\"x\",\"confidence\":0.9,\"source\":\"observed\"}' | bash '$SCRIPT'" + assert_failure + assert_output --partial "error:" +} + +@test "exits 1 with error message when deployment field is missing" { + run bash -c "echo '{\"key\":\"my-key\",\"type\":\"quirk\",\"insight\":\"x\",\"confidence\":0.9,\"source\":\"observed\"}' | bash '$SCRIPT'" + assert_failure + assert_output --partial "error:" +} + +@test "exits 1 with error message when type is not a valid value" { + run bash -c "echo '{\"deployment\":\"Pulse\",\"key\":\"my-key\",\"type\":\"unknown\",\"insight\":\"x\",\"confidence\":0.9,\"source\":\"observed\"}' | bash '$SCRIPT'" + assert_failure + assert_output --partial "error:" +} + +@test "accepts all four valid type values" { + for t in quirk known-safe procedure warning; do + run bash -c "echo '{\"deployment\":\"Pulse\",\"key\":\"k\",\"type\":\"$t\",\"insight\":\"x\",\"confidence\":0.9,\"source\":\"observed\"}' | bash '$SCRIPT'" + assert_success + done +} + +@test "exits 1 with clean error message when stdin is invalid JSON" { + run bash -c "echo 'not json at all' | bash '$SCRIPT'" + assert_failure + assert_output --partial "error: invalid JSON on stdin" + # Must NOT produce a Python traceback — the message should start with the tool name + refute_output --partial "Traceback" +} + +@test "exits 1 with error when confidence is missing" { + run bash -c "echo '{\"deployment\":\"Pulse\",\"key\":\"k\",\"type\":\"quirk\",\"insight\":\"x\",\"source\":\"observed\"}' | bash '$SCRIPT'" + assert_failure + assert_output --partial "error:" + assert_output --partial "confidence" +} + +@test "exits 1 with error when confidence is a string" { + run bash -c "echo '{\"deployment\":\"Pulse\",\"key\":\"k\",\"type\":\"quirk\",\"insight\":\"x\",\"confidence\":\"0.9\",\"source\":\"observed\"}' | bash '$SCRIPT'" + assert_failure + assert_output --partial "error:" + assert_output --partial "confidence" +} + +@test "exits 1 with error when confidence is out of range" { + run bash -c "echo '{\"deployment\":\"Pulse\",\"key\":\"k\",\"type\":\"quirk\",\"insight\":\"x\",\"confidence\":1.5,\"source\":\"observed\"}' | bash '$SCRIPT'" + assert_failure + assert_output --partial "error:" + assert_output --partial "confidence" +} + +@test "exits 1 with error when confidence is a boolean" { + run bash -c "echo '{\"deployment\":\"Pulse\",\"key\":\"k\",\"type\":\"quirk\",\"insight\":\"x\",\"confidence\":true,\"source\":\"observed\"}' | bash '$SCRIPT'" + assert_failure + assert_output --partial "error:" + assert_output --partial "confidence" +} + +@test "exits 1 with error message when source field is missing" { + run bash -c "echo '{\"deployment\":\"Pulse\",\"key\":\"k\",\"type\":\"quirk\",\"insight\":\"x\",\"confidence\":0.9}' | bash '$SCRIPT'" + assert_failure + assert_output --partial "error:" +} + +@test "concurrent writes all produce valid JSONL lines" { + for i in 1 2 3 4 5; do + echo "{\"deployment\":\"Pulse\",\"key\":\"key-${i}\",\"type\":\"quirk\",\"insight\":\"learning ${i}\",\"confidence\":0.8,\"source\":\"observed\"}" \ + | bash "$SCRIPT" & + done + wait + python3 -c " +import json +with open('$TRIMKIT_SYSOPS_LEARNINGS_FILE') as f: + lines = [l for l in f if l.strip()] +assert len(lines) == 5, f'expected 5 lines, got {len(lines)}' +for line in lines: + json.loads(line) +" +} diff --git a/tests/bin/trimkit-learnings-search.bats b/tests/bin/trimkit-learnings-search.bats new file mode 100644 index 0000000..f63eef2 --- /dev/null +++ b/tests/bin/trimkit-learnings-search.bats @@ -0,0 +1,239 @@ +#!/usr/bin/env bats + +# tests/bin/trimkit-learnings-search.bats — tests for bin/trimkit-learnings-search + +setup() { + load '../test_helper/bats-support/load' + load '../test_helper/bats-assert/load' + SCRIPT="$BATS_TEST_DIRNAME/../../bin/trimkit-learnings-search" + TMPDIR_CUSTOM="$(mktemp -d)" + export TRIMKIT_SYSOPS_LEARNINGS_DIR="$TMPDIR_CUSTOM" + export TRIMKIT_SYSOPS_LEARNINGS_FILE="$TMPDIR_CUSTOM/learnings.jsonl" +} + +teardown() { + rm -rf "$TMPDIR_CUSTOM" +} + +# Write a learning entry directly to the file (bypassing trimkit-learnings-log +# so these tests don't depend on the other script). +write_entry() { + printf '%s\n' "$1" >> "$TRIMKIT_SYSOPS_LEARNINGS_FILE" +} + +ENTRY_PULSE_A='{"ts":"2026-04-10T10:00:00Z","deployment":"Pulse","key":"caddy-restart","type":"quirk","insight":"Caddy needs restart after upgrade.","confidence":0.9,"source":"observed"}' +ENTRY_PULSE_B='{"ts":"2026-04-11T10:00:00Z","deployment":"Pulse","key":"caddy-restart","type":"quirk","insight":"Caddy restart confirmed again.","confidence":0.95,"source":"observed"}' +ENTRY_CURIA='{"ts":"2026-04-10T11:00:00Z","deployment":"Curia","key":"postgres-slow-start","type":"quirk","insight":"Postgres takes ~30s to accept connections after reboot.","confidence":0.8,"source":"observed"}' + +@test "script exists and is executable" { + [ -x "$SCRIPT" ] +} + +@test "exits 0 with no output when learnings file does not exist" { + run bash "$SCRIPT" + assert_success + assert_output "" +} + +@test "outputs all entries when no filter given" { + write_entry "$ENTRY_PULSE_A" + write_entry "$ENTRY_CURIA" + run bash "$SCRIPT" + assert_success + # Two distinct keys → two output lines + line_count="$(echo "$output" | grep -c .)" + assert_equal "$line_count" "2" +} + +@test "deduplicates by key — latest entry wins" { + # Write the same key twice; second entry should supersede the first + write_entry "$ENTRY_PULSE_A" + write_entry "$ENTRY_PULSE_B" + run bash "$SCRIPT" + assert_success + line_count="$(echo "$output" | grep -c .)" + assert_equal "$line_count" "1" + # The surviving entry should have the later insight + assert_output --partial "Caddy restart confirmed again." +} + +@test "dedup preserves entries with distinct keys" { + write_entry "$ENTRY_PULSE_A" + write_entry "$ENTRY_CURIA" + run bash "$SCRIPT" + assert_success + line_count="$(echo "$output" | grep -c .)" + assert_equal "$line_count" "2" +} + +@test "filters by deployment (case-insensitive)" { + write_entry "$ENTRY_PULSE_A" + write_entry "$ENTRY_CURIA" + run bash "$SCRIPT" --deployment Pulse + assert_success + assert_output --partial "caddy-restart" + # Curia entry must not appear + refute_output --partial "postgres-slow-start" +} + +@test "deployment filter is case-insensitive" { + write_entry "$ENTRY_PULSE_A" + run bash "$SCRIPT" --deployment pulse + assert_success + assert_output --partial "caddy-restart" +} + +@test "returns no output when deployment filter matches nothing" { + write_entry "$ENTRY_PULSE_A" + run bash "$SCRIPT" --deployment Nonexistent + assert_success + assert_output "" +} + +@test "each output line is valid JSON" { + write_entry "$ENTRY_PULSE_A" + write_entry "$ENTRY_CURIA" + run bash "$SCRIPT" + assert_success + echo "$output" | python3 -c " +import json, sys +for line in sys.stdin: + line = line.strip() + if line: + json.loads(line) +" +} + +@test "exits 1 with error on unknown argument" { + run bash "$SCRIPT" --unknown-flag + assert_failure + assert_output --partial "error: unknown argument" +} + +@test "exits 1 with error when --deployment is given without a value" { + run bash "$SCRIPT" --deployment + assert_failure + assert_output --partial "error:" +} + +@test "exits 1 with error when --deployment is given an empty string" { + run bash "$SCRIPT" --deployment "" + assert_failure + assert_output --partial "error:" +} + +@test "corrupt JSONL lines are skipped and valid entries still appear" { + printf 'this is not json\n' >> "$TRIMKIT_SYSOPS_LEARNINGS_FILE" + write_entry "$ENTRY_PULSE_A" + run bash "$SCRIPT" + assert_success + assert_output --partial "caddy-restart" +} + +@test "corrupt JSONL lines emit a warning on stderr" { + printf 'not json\n' >> "$TRIMKIT_SYSOPS_LEARNINGS_FILE" + run bash "$SCRIPT" 2>&1 + assert_success + assert_output --partial "warning" +} + +@test "same key on different deployments are not deduplicated against each other" { + # Both Pulse and Curia have key "shared-key"; both should survive + pulse='{"ts":"2026-04-10T10:00:00Z","deployment":"Pulse","key":"shared-key","type":"quirk","insight":"Pulse insight.","confidence":0.9,"source":"observed"}' + curia='{"ts":"2026-04-10T11:00:00Z","deployment":"Curia","key":"shared-key","type":"quirk","insight":"Curia insight.","confidence":0.8,"source":"observed"}' + write_entry "$pulse" + write_entry "$curia" + run bash "$SCRIPT" + assert_success + line_count="$(echo "$output" | grep -c .)" + assert_equal "$line_count" "2" +} + +@test "deployment filter with shared key only returns matching deployment" { + pulse='{"ts":"2026-04-10T10:00:00Z","deployment":"Pulse","key":"shared-key","type":"quirk","insight":"Pulse insight.","confidence":0.9,"source":"observed"}' + curia='{"ts":"2026-04-10T11:00:00Z","deployment":"Curia","key":"shared-key","type":"quirk","insight":"Curia insight.","confidence":0.8,"source":"observed"}' + write_entry "$pulse" + write_entry "$curia" + run bash "$SCRIPT" --deployment Pulse + assert_success + assert_output --partial "Pulse insight." + refute_output --partial "Curia insight." +} + +@test "exits 0 on empty learnings file" { + touch "$TRIMKIT_SYSOPS_LEARNINGS_FILE" + run bash "$SCRIPT" + assert_success + assert_output "" +} + +@test "skips blank lines in the learnings file" { + printf '\n' >> "$TRIMKIT_SYSOPS_LEARNINGS_FILE" + write_entry "$ENTRY_PULSE_A" + printf '\n' >> "$TRIMKIT_SYSOPS_LEARNINGS_FILE" + run bash "$SCRIPT" + assert_success + line_count="$(echo "$output" | grep -c .)" + assert_equal "$line_count" "1" +} + +# ── --human flag tests ────────────────────────────────────────────────────── + +@test "--human shows formatted output with deployment, type, key, and insight" { + write_entry "$ENTRY_PULSE_A" + run bash "$SCRIPT" --human + assert_success + assert_output --partial "Pulse" + assert_output --partial "[quirk]" + assert_output --partial "caddy-restart" + assert_output --partial "confidence: 0.9" + assert_output --partial "Caddy needs restart after upgrade." +} + +@test "--human with --deployment filters correctly" { + write_entry "$ENTRY_PULSE_A" + write_entry "$ENTRY_CURIA" + run bash "$SCRIPT" --human --deployment Pulse + assert_success + assert_output --partial "caddy-restart" + refute_output --partial "postgres-slow-start" +} + +@test "--human shows 'No learnings found' when file does not exist" { + run bash "$SCRIPT" --human + assert_success + assert_output --partial "No learnings stored yet" +} + +@test "--human shows 'No learnings found' for deployment with no matches" { + write_entry "$ENTRY_PULSE_A" + run bash "$SCRIPT" --human --deployment Nonexistent + assert_success + assert_output --partial "No learnings found" + assert_output --partial "Nonexistent" +} + +@test "--human with corrupt lines still shows warning on stderr" { + printf 'not json\n' >> "$TRIMKIT_SYSOPS_LEARNINGS_FILE" + write_entry "$ENTRY_PULSE_A" + run bash "$SCRIPT" --human 2>&1 + assert_success + assert_output --partial "warning" + assert_output --partial "caddy-restart" +} + +@test "--human deduplicates and shows latest entry" { + write_entry "$ENTRY_PULSE_A" + write_entry "$ENTRY_PULSE_B" + run bash "$SCRIPT" --human + assert_success + assert_output --partial "Caddy restart confirmed again." + refute_output --partial "Caddy needs restart after upgrade." +} + +@test "--human on empty file shows 'No learnings found'" { + touch "$TRIMKIT_SYSOPS_LEARNINGS_FILE" + run bash "$SCRIPT" --human + assert_success + assert_output --partial "No learnings found" +} diff --git a/tests/bin/trimkit-sysops-log-search.bats b/tests/bin/trimkit-sysops-log-search.bats new file mode 100644 index 0000000..8e8b3e7 --- /dev/null +++ b/tests/bin/trimkit-sysops-log-search.bats @@ -0,0 +1,277 @@ +#!/usr/bin/env bats + +# tests/bin/trimkit-sysops-log-search.bats — tests for bin/trimkit-sysops-log-search + +setup() { + load '../test_helper/bats-support/load' + load '../test_helper/bats-assert/load' + SCRIPT="$BATS_TEST_DIRNAME/../../bin/trimkit-sysops-log-search" + TMPDIR_CUSTOM="$(mktemp -d)" + export TRIMKIT_SYSOPS_LOG_DIR="$TMPDIR_CUSTOM" + export TRIMKIT_SYSOPS_LOG_FILE="$TMPDIR_CUSTOM/audit.jsonl" +} + +teardown() { + rm -rf "$TMPDIR_CUSTOM" +} + +# Write an audit entry directly to the file (bypassing trimkit-sysops-log). +write_entry() { + printf '%s\n' "$1" >> "$TRIMKIT_SYSOPS_LOG_FILE" +} + +# Sample audit entries — minimal set of fields for testing +ENTRY_STATUS='{"ts":"2026-04-10T10:00:00Z","session":"sess-1","project":"curia","deployment":"Pulse","env":"prod","action":"status_check","containers":{"pulse":"healthy","caddy":"healthy"},"unregistered_containers":[],"notes":""}' +ENTRY_MAINT='{"ts":"2026-04-11T12:00:00Z","session":"sess-2","project":"curia","deployment":"Pulse","env":"prod","action":"maintenance","packages_upgraded":5,"reboot_performed":true,"reboot_duration_s":42,"containers":{"pulse":"healthy","caddy":"unhealthy"},"unregistered_containers":["redis-temp"],"notes":"caddy needed manual restart"}' +ENTRY_CURIA='{"ts":"2026-04-12T08:00:00Z","session":"sess-3","project":"curia","deployment":"Curia","env":"prod","action":"status_check","containers":{"postgres":"healthy"},"unregistered_containers":[],"notes":""}' + +# ── Basic functionality ───────────────────────────────────────────────────── + +@test "script exists and is executable" { + [ -x "$SCRIPT" ] +} + +@test "exits 0 with no output when log file does not exist" { + run bash "$SCRIPT" + assert_success + assert_output "" +} + +@test "outputs all entries as JSONL when no filter given" { + write_entry "$ENTRY_STATUS" + write_entry "$ENTRY_MAINT" + write_entry "$ENTRY_CURIA" + run bash "$SCRIPT" + assert_success + line_count="$(echo "$output" | grep -c .)" + assert_equal "$line_count" "3" +} + +@test "each output line is valid JSON" { + write_entry "$ENTRY_STATUS" + write_entry "$ENTRY_MAINT" + run bash "$SCRIPT" + assert_success + echo "$output" | python3 -c " +import json, sys +for line in sys.stdin: + line = line.strip() + if line: + json.loads(line) +" +} + +# ── --deployment filter ───────────────────────────────────────────────────── + +@test "filters by deployment" { + write_entry "$ENTRY_STATUS" + write_entry "$ENTRY_CURIA" + run bash "$SCRIPT" --deployment Pulse + assert_success + assert_output --partial "Pulse" + refute_output --partial "Curia" +} + +@test "deployment filter is case-insensitive" { + write_entry "$ENTRY_STATUS" + run bash "$SCRIPT" --deployment pulse + assert_success + assert_output --partial "Pulse" +} + +@test "returns no output when deployment filter matches nothing" { + write_entry "$ENTRY_STATUS" + run bash "$SCRIPT" --deployment Nonexistent + assert_success + assert_output "" +} + +# ── --last limit ──────────────────────────────────────────────────────────── + +@test "default limit is 10 entries" { + # Write 12 entries; should only get last 10 + for i in $(seq 1 12); do + write_entry "{\"ts\":\"2026-04-${i}T00:00:00Z\",\"session\":\"s-$i\",\"project\":\"p\",\"deployment\":\"D\",\"env\":\"prod\",\"action\":\"status_check\",\"containers\":{},\"unregistered_containers\":[],\"notes\":\"entry $i\"}" + done + run bash "$SCRIPT" + assert_success + line_count="$(echo "$output" | grep -c .)" + assert_equal "$line_count" "10" + # First two entries (1, 2) should be trimmed; entry 3 should be present + assert_output --partial "entry 3" + refute_output --partial "entry 1\"" +} + +@test "--last limits output to N entries" { + write_entry "$ENTRY_STATUS" + write_entry "$ENTRY_MAINT" + write_entry "$ENTRY_CURIA" + run bash "$SCRIPT" --last 2 + assert_success + line_count="$(echo "$output" | grep -c .)" + assert_equal "$line_count" "2" +} + +@test "--last with --deployment filters first then limits" { + write_entry "$ENTRY_STATUS" + write_entry "$ENTRY_MAINT" + write_entry "$ENTRY_CURIA" + # Only 2 Pulse entries exist; --last 1 should give the most recent one + run bash "$SCRIPT" --deployment Pulse --last 1 + assert_success + line_count="$(echo "$output" | grep -c .)" + assert_equal "$line_count" "1" + assert_output --partial "maintenance" +} + +# ── Error handling ────────────────────────────────────────────────────────── + +@test "exits 1 on unknown argument" { + run bash "$SCRIPT" --unknown + assert_failure + assert_output --partial "error: unknown argument" +} + +@test "exits 1 when --deployment given without value" { + run bash "$SCRIPT" --deployment + assert_failure + assert_output --partial "error:" +} + +@test "exits 1 when --deployment given an empty string" { + run bash "$SCRIPT" --deployment "" + assert_failure + assert_output --partial "error:" +} + +@test "exits 1 when --last given without value" { + run bash "$SCRIPT" --last + assert_failure + assert_output --partial "error:" +} + +@test "exits 1 when --last given a non-integer" { + run bash "$SCRIPT" --last abc + assert_failure + assert_output --partial "error:" + assert_output --partial "positive integer" +} + +@test "exits 1 when --last given zero" { + run bash "$SCRIPT" --last 0 + assert_failure + assert_output --partial "error:" +} + +@test "exits 1 when --last given negative number" { + run bash "$SCRIPT" --last -5 + assert_failure + assert_output --partial "error:" +} + +# ── Corrupt lines ─────────────────────────────────────────────────────────── + +@test "corrupt JSONL lines are skipped and valid entries still appear" { + printf 'this is not json\n' >> "$TRIMKIT_SYSOPS_LOG_FILE" + write_entry "$ENTRY_STATUS" + run bash "$SCRIPT" + assert_success + assert_output --partial "status_check" +} + +@test "corrupt JSONL lines emit a warning on stderr" { + printf 'not json\n' >> "$TRIMKIT_SYSOPS_LOG_FILE" + run bash "$SCRIPT" 2>&1 + assert_success + assert_output --partial "warning" +} + +@test "exits 0 on empty log file" { + touch "$TRIMKIT_SYSOPS_LOG_FILE" + run bash "$SCRIPT" + assert_success + assert_output "" +} + +@test "skips blank lines in the log file" { + printf '\n' >> "$TRIMKIT_SYSOPS_LOG_FILE" + write_entry "$ENTRY_STATUS" + printf '\n' >> "$TRIMKIT_SYSOPS_LOG_FILE" + run bash "$SCRIPT" + assert_success + line_count="$(echo "$output" | grep -c .)" + assert_equal "$line_count" "1" +} + +# ── --human flag ──────────────────────────────────────────────────────────── + +@test "--human shows formatted output for status check" { + write_entry "$ENTRY_STATUS" + run bash "$SCRIPT" --human + assert_success + assert_output --partial "Pulse [prod]" + assert_output --partial "status_check" + assert_output --partial "Containers:" + assert_output --partial "Unregistered: none" +} + +@test "--human shows maintenance details" { + write_entry "$ENTRY_MAINT" + run bash "$SCRIPT" --human + assert_success + assert_output --partial "maintenance" + assert_output --partial "Packages upgraded: 5" + assert_output --partial "Reboot: yes (42s)" + assert_output --partial "redis-temp" + assert_output --partial "Notes: caddy needed manual restart" +} + +@test "--human with --deployment filters correctly" { + write_entry "$ENTRY_STATUS" + write_entry "$ENTRY_CURIA" + run bash "$SCRIPT" --human --deployment Curia + assert_success + assert_output --partial "Curia" + refute_output --partial "Pulse" +} + +@test "--human shows 'No audit log found' when file does not exist" { + run bash "$SCRIPT" --human + assert_success + assert_output --partial "No audit log found" +} + +@test "--human shows 'No entries found' when deployment matches nothing" { + write_entry "$ENTRY_STATUS" + run bash "$SCRIPT" --human --deployment Nonexistent + assert_success + assert_output --partial "No entries found" + assert_output --partial "Nonexistent" +} + +@test "--human with corrupt lines shows warning" { + printf 'not json\n' >> "$TRIMKIT_SYSOPS_LOG_FILE" + write_entry "$ENTRY_STATUS" + run bash "$SCRIPT" --human 2>&1 + assert_success + assert_output --partial "warning" + assert_output --partial "Pulse" +} + +@test "--human on empty file shows 'No entries found'" { + touch "$TRIMKIT_SYSOPS_LOG_FILE" + run bash "$SCRIPT" --human + assert_success + assert_output --partial "No entries found" +} + +@test "--human respects --last limit" { + write_entry "$ENTRY_STATUS" + write_entry "$ENTRY_MAINT" + write_entry "$ENTRY_CURIA" + run bash "$SCRIPT" --human --last 1 + assert_success + # Only the last entry (Curia) should appear + assert_output --partial "Curia" + refute_output --partial "maintenance" +}