Skip to content

ik-labs/splunk-app-lifecycle-copilot

Repository files navigation

Splunk App Lifecycle Copilot

Splunk App Lifecycle Copilot is a hackathon MVP for carrying a Splunk app from raw logs to a CIM-clean, AppInspect-green package with self-healing loops and an auditable provenance trail.

Positioning: Splunk's own AI can explain failures. This project resolves, validates, and remembers them.

Judges: 60-Second Path

Fastest look — no Splunk, no MCP, no install (needs only Bun):

make dashboard          # cd ui/dashboard && bun install && bun run dev

Open the printed URL. The dashboard lands on a Lifecycle overview of both self-heal loops, then drill into the Onboarding and AppInspect stages. The Provenance Ledger panel is the "resolve, validate, and remember" thesis made literal — every diagnosis, patch, rationale, and validation result from a verified run, replayed from committed demo events.

Run the real software end-to-end:

make setup              # Python 3.13 venv + install (see Requirements re: 3.13)
make demo               # runs the loops, prints where every artifact landed

make demo always runs the dependency-free AppInspect loop. It also runs the live onboarding loop (HEC ingest → MCP splunk_run_query validation) when .env carries the Splunk + MCP credentials; otherwise it points back to the zero-deps dashboard replay. make help lists every target.

Hackathon Scope

  • Event: Splunk Agentic Ops Hackathon
  • Track: Platform & Developer Experience
  • Bonus target: Best Use of Splunk MCP Server
  • Submission deadline: June 15, 2026, 9:00 AM PDT

The MVP builds three loops on one shared self-heal engine:

  • Stage 1, onboarding: raw UPI/GST-style logs -> inline rex / eval extraction candidates -> validation against real Splunk events through splunk_run_query -> final props.conf / transforms.conf only after convergence.
  • Stage 2, AppInspect: deliberately broken app -> AppInspect JSON failures -> deterministic patch functions -> re-run until green.
  • Stage 4, cost-aware SPL lint: a deliberately costly search -> deterministic cost findings (index=*, no time bound, unbounded | sort) -> deterministic rewrites -> re-lint until clean. Static analysis, no live Splunk.

Stages 3 (scaffold + test data) and 5 (dashboard migration) remain architecture-only future extensions.

Current Repo Contents

  • SCOPE.md: source-of-truth scope, build sequencing, risks, and submission checklist.
  • UX_DEMO_PLAN.md: dashboard/CLI/IDE surfaces and the agent-to-UI event contract.
  • architecture_diagram.md: required architecture diagram artifact.
  • demo/architecture_demo.md: technical, presentation-ready architecture walkthrough (system overview, the self-heal engine, live-MCP onboarding, and live-mode SSE — five Mermaid diagrams) for the demo.
  • docker-compose.yml: local Splunk Enterprise container with HEC enabled.
  • smoke_test.py: Day-1 Splunk SDK, HEC, search, and AppInspect smoke test.
  • fixtures/onboarding/sample_upi.log: 150-line synthetic UPI transaction fixture.
  • fixtures/appinspect/broken_app: AppInspect failure fixture for the first build milestone.

Requirements

  • Python 3.13 (use 3.13 specifically, not 3.14 — see note below)
  • Docker Desktop
  • Homebrew libmagic on macOS for splunk-appinspect
  • Splunk Enterprise Docker image (splunk/splunk:latest)
  • Splunk MCP Server app for the onboarding loop
  • Optional: Splunk AI Assistant for saia_generate_spl, saia_explain_spl, and saia_optimize_spl

Python dependency notes:

  • PyPI package is splunk-sdk; import name is splunklib.
  • splunklib.ai requires splunk-sdk>=3.0.0, which requires Python 3.13+.
  • Pin the interpreter to Python 3.13, not 3.14. On 3.14, AppInspect's bundled Python static analyzer fails to initialize and reports every Python check as an error ("Python analyzer is failed in initialization"). That leaves the AppInspect self-heal loop unable to reach a clean result even after the three real failures are patched. On macOS: brew install python@3.13 and build the venv with /opt/homebrew/opt/python@3.13/bin/python3.13 -m venv .venv.
  • splunk_run_query is the required MCP validation tool. saia_* tools are optional graceful enhancement only.

Setup

python3.13 -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env

Edit .env with your local values. The MCP token must be the encrypted token generated by the Splunk MCP Server app. It is not a plain Splunk REST bearer token. Do not build the demo around OAuth; OAuth for Splunk MCP is Controlled Access / closed preview.

On macOS, install AppInspect's system dependency if needed:

brew install libmagic

Start Splunk:

docker compose up -d

Ports are driven by .env (SPLUNK_WEB_PORT, SPLUNK_HEC_PORT, SPLUNK_MGMT_PORT). If host port 8088 is already in use, set a different SPLUNK_HEC_PORT (the local dev box uses 18088); the onboarding loop reads the same value, so HEC ingest stays in sync.

Wait for the container to become healthy, then run:

python smoke_test.py

AppInspect Fixture Check

The current AppInspect fixture is designed to fire three deterministic failures:

  • check_that_local_does_not_exist: forbidden local/ directory.
  • check_user_seed_conf_deny_list: forbidden default/user-seed.conf.
  • check_if_outputs_conf_exists: forwarding enabled in default/outputs.conf.

Run the self-heal loop:

copilot appinspect fixtures/appinspect/broken_app --out runs/appinspect-demo

The command copies the fixture into runs/appinspect-demo/work/broken_app, patches only that working copy, and writes:

  • runs/appinspect-demo/appinspect/iteration-XX.json
  • runs/appinspect-demo/events.jsonl
  • runs/appinspect-demo/events.json
  • runs/appinspect-demo/provenance.jsonl
  • runs/appinspect-demo/summary.json

Validate it directly:

splunk-appinspect inspect fixtures/appinspect/broken_app \
  --mode test \
  --data-format json \
  --output-file /tmp/broken_app_result.json

Expected summary:

failure: 3
error: 0

Splunk MCP Server Setup (onboarding loop only)

The onboarding loop is the only path that needs the Splunk MCP Server app and an encrypted token. The static loops (AppInspect, SPL lint), the dashboard, and Live mode need none of this — skip this section if you are not running onboarding.

Verified against Splunk MCP Server v1.2.0 on splunk/splunk:latest. The commands below use $SPLUNK_MGMT_PORT (default 8089) and the admin password from .env.

1. Install the app. Download "Splunk MCP Server" from Splunkbase, then in Splunk Web go to Apps -> Manage Apps -> Install app from file and upload the .tar.gz (or docker cp it into $SPLUNK_HOME/etc/apps/ and restart the container). Confirm it is installed and enabled:

curl -sk -u "admin:$SPLUNK_PASSWORD" \
  "https://localhost:8089/services/apps/local/Splunk_MCP_Server?output_mode=json" \
  | python3 -c "import sys,json;e=json.load(sys.stdin)['entry'][0]['content'];print('enabled' if not e.get('disabled') else 'DISABLED','v'+str(e.get('version')))"

2. Enable token authentication (once; idempotent):

curl -sk -u "admin:$SPLUNK_PASSWORD" -X POST \
  "https://localhost:8089/services/admin/token-auth/tokens_auth" -d disabled=0

3. Mint the encrypted token. The app RSA-encrypts a JWT (audience mcp); mcp.conf sets require_encrypted_token = true, so this is not a plain REST bearer token. The + in the relative expiry must be URL-encoded as %2B or the app rejects the request:

curl -sk -u "admin:$SPLUNK_PASSWORD" \
  "https://localhost:8089/services/mcp_token?username=admin&expires_on=%2B30d"
# -> {"token": "<encrypted-token>"}   (valid 30 days)

4. Put the values in .env:

SPLUNK_MCP_ENDPOINT=https://localhost:8089/services/mcp
SPLUNK_MCP_ENCRYPTED_TOKEN=<the token value from step 3>
SPLUNK_MCP_TLS_VERIFY=false   # local self-signed Docker dev

The token expires after 30 days; re-mint by repeating steps 2-3. The saia_* tools are optional and only appear when Splunk AI Assistant is installed — the onboarding loop succeeds with splunk_run_query alone.

Onboarding MCP Slice

The onboarding slice is live-only: it ingests fixtures/onboarding/sample_upi.log through HEC, validates inline SPL candidates with MCP splunk_run_query, and hard-fails if the Splunk MCP Server app or splunk_run_query tool is unavailable. It does not use Splunk AI Assistant, Splunk SDK search fallback, or generated props.conf / transforms.conf yet.

Required .env values:

  • SPLUNK_HEC_TOKEN
  • SPLUNK_ONBOARDING_INDEX=main
  • SPLUNK_ONBOARDING_SOURCETYPE=upi_gateway_raw
  • SPLUNK_MCP_ENDPOINT
  • SPLUNK_MCP_ENCRYPTED_TOKEN
  • SPLUNK_MCP_TLS_VERIFY=false for local self-signed Docker dev

Run it:

copilot onboard fixtures/onboarding/sample_upi.log --out runs/onboarding-demo

Expected flow: candidate-00 fails coverage checks, the deterministic patcher switches to candidate-01, MCP revalidation passes, six CIM mapping events and two PII flag events are written for dashboard replay.

Cost-Aware SPL Lint (Stage 4)

The SPL lint loop is static like AppInspect — no live Splunk required. It lints a search for cost anti-patterns, heals each with a deterministic rewrite, and re-lints until clean.

copilot lint fixtures/spl_lint/costly_search.spl --out runs/spl-lint-demo

The fixture fires three findings, each healed in its own iteration:

  • spl_wildcard_index: index=* scans every index -> rewritten to index=main.
  • spl_all_time: no earliest/latest bound -> prepended earliest=-24h.
  • spl_unbounded_sort: | sort with no limit -> capped at | sort 1000.

It writes the same artifact set as the other loops (spl/iteration-XX.json, events.json, provenance.jsonl, summary.json), so the run drops straight into the dashboard's SPL Lint stage.

Demo Flow

  1. Terminal starts the agent and proves this is real software.
  2. Dashboard shows onboarding: fields appear as splunk_run_query validates inline extraction candidates against the 150-line fixture, PII is flagged, and CIM mapping converges.
  3. Dashboard shows AppInspect: three red failures are diagnosed, patched by deterministic functions, and revalidated to green.
  4. VS Code cutaway shows the same agent entry point from the IDE.
  5. Provenance ledger shows every diagnosis, patch, rationale, and validation result.

Dashboard Replay Mode

The dashboard replays all three self-heal loops. It opens on a Lifecycle overview that summarizes every loop side by side (status, failures healed, iterations, MCP calls) to make the one-engine/many-loops thesis legible at a glance. From there, use the sidebar to open the Onboarding, AppInspect, or SPL Lint stage; each renders committed demo events (demo/onboarding_events.json, demo/appinspect_events.json, demo/spl_lint_events.json) and requires no Splunk, MCP, or live WebSocket. The onboarding stage adds an MCP tool-call count and a CIM-mapping / PII panel sourced from a verified live run.

Every stage also renders a Provenance Ledger panel: the complete, durable audit trail read from the persisted *_provenance.jsonl — each entry's diagnosis, patch, rationale, validation result, changed paths, and timestamp. That panel is the "and remember" half of the thesis, and it is exactly what a reviewer would inspect to trust an automated fix.

cd ui/dashboard
bun install
bun run dev

Verification:

bun run test
bun run build

Live Mode

The dashboard can also stream a self-heal loop as it runs, instead of replaying committed events. Start the SSE server, then click Go Live on the AppInspect or SPL Lint stage:

copilot serve            # or: make serve  (defaults to 127.0.0.1:8765)

The server (lifecycle_copilot.server) runs the chosen static loop — AppInspect or SPL lint, neither needs a live Splunk — in a background thread and streams each event over Server-Sent Events. The browser feeds those events through the same reducer used for replay, so live and replay render identically; only the event source differs. Point the dashboard at a non-default server with VITE_LIVE_URL. Onboarding is replay-only here because it requires Splunk + MCP.

Run from VS Code

The repo ships a .vscode/ workspace so the agent runs from the IDE — the same copilot entry point, surfaced as Run Task and Run and Debug entries:

  • Terminal → Run Task lists each loop: AppInspect self-heal, SPL lint self-heal, Onboard (live MCP), plus Live stream server, Dashboard, and Run Python tests. AppInspect and SPL lint need no Splunk.
  • Run and Debug (launch.json) starts any loop under debugpy with the venv interpreter, so you can set breakpoints in the self-heal engine.
  • extensions.json recommends the Python, debugpy, and Bun extensions.

Run make setup first so .venv exists; the tasks call .venv/bin/copilot.

Design Principle

The self-heal engine is intentionally constrained. The LLM produces diagnosis and rationale text; deterministic patch functions make file changes. That gives the demo repeatability, keeps patch provenance reviewable, and makes the platform thesis credible.

About

Self-healing loops that carry a Splunk app from raw logs to AppInspect-green to cost-clean SPL, validated through the Splunk MCP server with an auditable provenance ledger.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors