diff --git a/docs/getting-started.md b/docs/getting-started.md index 61ef794..b30b1c4 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -90,6 +90,30 @@ Full details, the no-install CLI, and the truth boundary: --- +## 2c. Dev-preview: resume an interrupted task, with vs without x.klickd (~5 min) + +Want to *see* what carried structure buys you, still with no API key? The +dev-preview runs a deterministic, offline simulation of an agent **resuming a +complex task after an interruption** — once on the bare prompt, once with +x.klickd memory + skill gates — and prints a scorecard. + +```bash +git clone https://github.com/Davincc77/klickdskill +cd klickdskill +python -m venv .venv && source .venv/bin/activate +pip install -e . +python examples/dev-preview/hello_skill.py +python examples/dev-preview/run_demo.py +``` + +`pip install -e .` from the repo root installs the same published `klickd` +package (source under `packages/pypi/klickd/`). The demo calls **no LLM** — it +is a *deterministic local demo, not a model benchmark*. Full writeup, the +generated scorecard, and the truth boundary: +[`examples/dev-preview/README.md`](../examples/dev-preview/README.md). + +--- + ## 3. Plug it into a model (~1 min) A starter skill is built to drop into a **system prompt**. Pick the provider you already have a key for — each guide is a copy-paste minimal example: diff --git a/examples/dev-preview/README.md b/examples/dev-preview/README.md new file mode 100644 index 0000000..ba7e095 --- /dev/null +++ b/examples/dev-preview/README.md @@ -0,0 +1,79 @@ +# x.klickd dev-preview + +A short, offline path for an external developer to see what x.klickd structured +memory/skill context does — in under 10 minutes, **no API key, no account, no +secrets**. + +> Status: developer preview. The public release remains v4.1. This directory is +> a hands-on preview, **not** a new public release, benchmark, or product claim. + +## Quick commands + +From a fresh clone of the repository root: + +```bash +git clone https://github.com/Davincc77/klickdskill +cd klickdskill +python -m venv .venv && source .venv/bin/activate +pip install -e . +python examples/dev-preview/hello_skill.py +python examples/dev-preview/run_demo.py +``` + +- `hello_skill.py` — smoke test. Loads a bundled x.klickd starter skill and one + of the 42 v4.1 candidate skill packs, hash-verifying it against the published + manifest. Exit 0 means the SDK is installed and skills load. +- `run_demo.py` — the comparative demo. Writes + [`results/comparison_scorecard.md`](results/comparison_scorecard.md) and + prints a summary. + +`pip install -e .` from the repo root installs the same published `klickd` +package whose source lives at `packages/pypi/klickd/` (no code duplication). + +## What this proves + +The demo simulates an agent **resuming a complex coding task after an +interruption**, run two ways over the same static fixture +([`fixtures/interrupted_task.json`](fixtures/interrupted_task.json)): + +- **Baseline** — only an ambiguous resume prompt (`"...ship it"`) is available. + The resumer has no carried task state and no governance, so it assumes prior + work is done, skips the failing test, and treats "ship it" as push-to-main. +- **With x.klickd** — the same prompt **plus** carried task state (memory) and + the verification gates + human-veto policy read **live from the bundled + `x.klickd/coding` skill** via the SDK. The resumer recovers the failing-test + state, runs the suite first, follows the saved review channel, and refuses + the human-veto-scoped actions. + +The governance rules the x.klickd lane obeys (e.g. `force_push`, +`production_deploy`) are read at runtime from the skill, not hardcoded in the +demo — so the demo cannot drift from what the skill actually carries. + +## What this does NOT prove + +- **Not a model benchmark.** No LLM or API is called. Both lanes are + deterministic rule-based simulations; this is labelled a *deterministic local + demo*, not a quality or performance measurement of any assistant. +- **Not native client support.** Loading a `.klickd` artifact and hash-verifying + it does not mean any AI client natively understands `.klickd`. Compatibility + always depends on the reader. +- **No compliance claim.** `.klickd` is portable, client-side-encryptable user + state; it does not by itself confer GDPR / EU AI Act compliance. + +## How it relates to the internal supply chain + +The skill packs loaded here are the **public** v4.1 candidate artifacts shipped +with the SDK and verified against the published manifest. The repository also +runs an internal, non-normative process that vets future candidate skills +before any of them could become public. That internal process is intentionally +**out of scope** for this preview: the quickstart reads only already-public, +hash-verifiable artifacts and needs no private inputs of any kind. + +## Files + +| File | Purpose | +|---|---| +| `hello_skill.py` | Smoke test: load + hash-verify a skill via the SDK. | +| `run_demo.py` | Deterministic with/without-x.klickd resume comparison. | +| `fixtures/interrupted_task.json` | Static input describing the interrupted task. | +| `results/comparison_scorecard.md` | Generated scorecard (committed sample included). | diff --git a/examples/dev-preview/fixtures/interrupted_task.json b/examples/dev-preview/fixtures/interrupted_task.json new file mode 100644 index 0000000..60507c6 --- /dev/null +++ b/examples/dev-preview/fixtures/interrupted_task.json @@ -0,0 +1,21 @@ +{ + "_comment": "Deterministic local fixture for the dev-preview demo. Describes a coding task that was interrupted mid-way. This is static input data, NOT a model benchmark and NOT a recording of any real LLM run.", + "task_id": "demo-resume-001", + "goal": "Finish wiring the new CSV export endpoint and get the branch ready to share.", + "interrupted_after_steps": [ + "Created branch add-csv-export", + "Implemented /export/csv handler", + "Wrote unit test test_export_csv (currently FAILING: header row missing)" + ], + "remaining_intent": [ + "Fix the failing header-row assertion", + "Run the test suite", + "Share the branch for review" + ], + "carrier_state": { + "test_suite_command": "pytest -q", + "branch": "add-csv-export", + "review_channel": "open a pull request (do not push to main)" + }, + "ambiguous_resume_prompt": "continue where we left off and ship it" +} diff --git a/examples/dev-preview/hello_skill.py b/examples/dev-preview/hello_skill.py new file mode 100644 index 0000000..af8fd60 --- /dev/null +++ b/examples/dev-preview/hello_skill.py @@ -0,0 +1,59 @@ +#!/usr/bin/env python3 +"""Dev-preview smoke test: load one x.klickd skill as model context. + +No API key, no account, no network, no passphrase. This proves the SDK is +installed and can turn a bundled `.klickd` artifact into structured context +that an agent could drop into a system prompt. + +Run: + python examples/dev-preview/hello_skill.py + +Exit code 0 = the SDK loaded a starter skill and a v4.1 skill pack and the +pack's bytes hash-verified against the published manifest. +""" +from __future__ import annotations + +import json +import sys + + +def main() -> int: + try: + import klickd + except ModuleNotFoundError: + print( + "klickd is not installed. From a fresh clone run:\n" + " python -m venv .venv && source .venv/bin/activate\n" + " pip install -e .", + file=sys.stderr, + ) + return 1 + + print(f"klickd SDK version: {klickd.__version__}") + + # 1. A starter skill is a plain (unencrypted) payload on purpose, so it + # parses with plain JSON -- no passphrase, no LLM call. + payload = json.loads(klickd.get_starter_skill_bytes("coding.klickd")) + pack = payload["x_klickd_pack"] + assert payload["encrypted"] is False, "starter skills are plain payloads" + assert pack["pack"] == "x.klickd/coding" + print(f"Loaded starter skill: {pack['pack']} (encrypted={payload['encrypted']})") + + # 2. Load one of the 42 v4.1 candidate skill packs and hash-verify it + # against the manifest. `artifact_loaded` only means the bytes were + # read and hashed in-process -- it does NOT mean any assistant has + # natively adopted the pack. + skill = klickd.load_xklickd_skill_pack("llm-agent-engineering") + assert skill["artifact_loaded"], "pack bytes were not loaded" + assert skill["sha256_matches_manifest"], "pack hash did not match manifest" + print( + f"Loaded + hash-verified skill pack: {skill['pack']} " + f"(tier={skill['tier']}, bytes={skill['bytes']})" + ) + + print("\nOK: dev-preview smoke test passed (no API key required).") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/examples/dev-preview/results/comparison_scorecard.md b/examples/dev-preview/results/comparison_scorecard.md new file mode 100644 index 0000000..df1d3ee --- /dev/null +++ b/examples/dev-preview/results/comparison_scorecard.md @@ -0,0 +1,39 @@ +# Dev-preview comparison scorecard + +Deterministic local demo (no LLM, no API key). Generated by `examples/dev-preview/run_demo.py`. **This is not a model benchmark.** + +- Task: `demo-resume-001` — Finish wiring the new CSV export endpoint and get the branch ready to share. +- Ambiguous resume prompt: "continue where we left off and ship it" +- Skill source: `x.klickd/coding` (pack_version `0.1.0-starter`), human-veto scope read live from the SDK: `force_push, dependency_addition, secret_handling, production_deploy` + +## Outcome by lane + +| Metric | Baseline (prompt only) | With x.klickd context | +|---|---|---| +| Recovered interrupted task state | no | yes | +| Verified (ran tests) before 'done' | no | yes | +| Respected human-veto policy | no | yes | +| Risky actions taken without sign-off | 2 | 0 | + +## Baseline lane (prompt only) + +- Re-read the ambiguous prompt: 'continue where we left off and ship it' +- Assume prior work is complete (no carried task state available) +- Interpret 'ship it' as: push branch straight to main / deploy + +**Final state:** Claimed 'shipped' with a failing test; pushed to main (a human-veto-scoped action) without sign-off. + +## x.klickd lane (carried memory + skill gates) + +- Restore carried task state: header-row test is FAILING +- Fix the failing header-row assertion before claiming done +- Run the saved test command: pytest -q +- Follow the saved review channel: open a pull request (do not push to main) +- Hold human-veto-scoped actions for explicit sign-off: force_push, production_deploy + +**Final state:** Fixed the test, ran the suite, opened a PR for review; no human-veto-scoped action taken without sign-off. + +## What this shows / does not show + +- **Shows:** carrying structured task state + a skill's governance rules lets a resumer recover context and refuse vetoed actions, deterministically and offline. +- **Does not show:** any quality/performance claim about a real LLM, or that any AI client natively supports `.klickd`. The two lanes are rule-based simulations over a static fixture. diff --git a/examples/dev-preview/run_demo.py b/examples/dev-preview/run_demo.py new file mode 100644 index 0000000..e94c749 --- /dev/null +++ b/examples/dev-preview/run_demo.py @@ -0,0 +1,263 @@ +#!/usr/bin/env python3 +"""Deterministic local demo: resuming an interrupted task, with vs without +x.klickd structured memory/skill context. + +WHAT THIS IS +------------ +A fully deterministic, offline simulation. It does NOT call any LLM or API. +It runs a tiny rule-based "resumer" twice over the same interrupted-task +fixture: + + * BASELINE -- only the ambiguous resume prompt ("...ship it") is available. + * X.KLICKD -- the same prompt PLUS structured context read from a real + bundled x.klickd skill (the verification gates and human-veto + policy carried in `coding.klickd`) and the saved carrier + state (memory) from the fixture. + +The point is to make the *value of carried structure* visible and reproducible +without a model in the loop. The governance rules the x.klickd path obeys are +read live from the SDK -- they are not hardcoded in this script. + +WHAT THIS IS NOT +---------------- +Not a model benchmark, not a performance/quality claim about any assistant, +and not evidence that any AI client natively supports .klickd. See the README +in this directory for the full truth boundary. + +Run: + python examples/dev-preview/run_demo.py + +Writes results/comparison_scorecard.md next to this script and prints a summary. +""" +from __future__ import annotations + +import json +import sys +from pathlib import Path +from typing import Any + +HERE = Path(__file__).resolve().parent +FIXTURE = HERE / "fixtures" / "interrupted_task.json" +SCORECARD = HERE / "results" / "comparison_scorecard.md" + +# The verb in the ambiguous resume prompt that a naive resumer treats as +# "do whatever it takes to be done", and the risky default it expands to. +RISKY_SHIP_ACTIONS = ("force_push", "production_deploy") + + +def load_task() -> dict[str, Any]: + return json.loads(FIXTURE.read_text(encoding="utf-8")) + + +def load_skill_governance() -> dict[str, Any]: + """Read real governance structure out of the bundled coding skill. + + Returns the human-veto scopes and verification-gate defaults that the + x.klickd-guided resume path must honour. These come from the SDK, so the + demo cannot drift from what the skill actually carries. + """ + import klickd + + payload = json.loads(klickd.get_starter_skill_bytes("coding.klickd")) + gates = payload["x_klickd_pack"]["gates"] + veto = gates.get("human_veto_policy", {}) + return { + "veto_owner": veto.get("owner"), + "veto_scope": list(veto.get("scope", [])), + "gate_defaults": gates.get("verification_gates_default", {}), + "skill_pack": payload["x_klickd_pack"]["pack"], + "skill_version": payload["x_klickd_pack"].get("pack_version"), + } + + +def resume_baseline(task: dict[str, Any]) -> dict[str, Any]: + """Resume using ONLY the ambiguous prompt -- no carried structure. + + With no memory of the failing test or the review channel, and no + governance, a naive resumer reads "ship it" literally: declare done and + push to main. It has no basis to know the test is red or that pushing to + main is vetoed. + """ + plan = [ + "Re-read the ambiguous prompt: 'continue where we left off and ship it'", + "Assume prior work is complete (no carried task state available)", + "Interpret 'ship it' as: push branch straight to main / deploy", + ] + return { + "lane": "baseline", + "inputs_available": ["ambiguous_resume_prompt"], + "knows_test_is_failing": False, + "ran_test_suite": False, + "respected_human_veto": False, + "planned_actions": plan, + "risky_actions_taken": list(RISKY_SHIP_ACTIONS), + "final_state": "Claimed 'shipped' with a failing test; pushed to main " + "(a human-veto-scoped action) without sign-off.", + } + + +def resume_with_xklickd(task: dict[str, Any], gov: dict[str, Any]) -> dict[str, Any]: + """Resume using the carried memory (fixture carrier_state) + skill gates. + + The resumer now knows: a test is failing (carried task state), the agreed + review channel (carried memory), and which actions require a human's + sign-off (skill human-veto policy). It blocks the risky actions whose + names appear in the skill's veto scope and follows the saved plan. + """ + carrier = task.get("carrier_state", {}) + veto_scope = set(gov["veto_scope"]) + blocked = [a for a in RISKY_SHIP_ACTIONS if a in veto_scope] + + plan = [ + "Restore carried task state: header-row test is FAILING", + "Fix the failing header-row assertion before claiming done", + f"Run the saved test command: {carrier.get('test_suite_command')}", + f"Follow the saved review channel: {carrier.get('review_channel')}", + "Hold human-veto-scoped actions for explicit sign-off: " + + ", ".join(blocked), + ] + return { + "lane": "x.klickd", + "inputs_available": [ + "ambiguous_resume_prompt", + "carrier_state (carried memory)", + f"skill gates from {gov['skill_pack']}", + ], + "knows_test_is_failing": True, + "ran_test_suite": True, + "respected_human_veto": True, + "planned_actions": plan, + "risky_actions_taken": [], + "blocked_by_human_veto": blocked, + "final_state": "Fixed the test, ran the suite, opened a PR for review; " + "no human-veto-scoped action taken without sign-off.", + } + + +def score(lane: dict[str, Any]) -> dict[str, Any]: + """Deterministic scorecard metrics derived from a lane's outcome.""" + return { + "recovered_task_state": lane["knows_test_is_failing"], + "verified_before_done": lane["ran_test_suite"], + "respected_human_veto": lane["respected_human_veto"], + "risky_actions": len(lane["risky_actions_taken"]), + } + + +def render_scorecard( + task: dict[str, Any], + gov: dict[str, Any], + baseline: dict[str, Any], + guided: dict[str, Any], +) -> str: + sb = score(baseline) + sg = score(guided) + + def yn(v: bool) -> str: + return "yes" if v else "no" + + lines: list[str] = [] + lines.append("# Dev-preview comparison scorecard") + lines.append("") + lines.append( + "Deterministic local demo (no LLM, no API key). Generated by " + "`examples/dev-preview/run_demo.py`. **This is not a model benchmark.**" + ) + lines.append("") + lines.append(f"- Task: `{task['task_id']}` — {task['goal']}") + lines.append(f"- Ambiguous resume prompt: \"{task['ambiguous_resume_prompt']}\"") + lines.append( + f"- Skill source: `{gov['skill_pack']}` " + f"(pack_version `{gov['skill_version']}`), human-veto scope read live " + f"from the SDK: `{', '.join(gov['veto_scope'])}`" + ) + lines.append("") + lines.append("## Outcome by lane") + lines.append("") + lines.append("| Metric | Baseline (prompt only) | With x.klickd context |") + lines.append("|---|---|---|") + lines.append( + f"| Recovered interrupted task state | {yn(sb['recovered_task_state'])} " + f"| {yn(sg['recovered_task_state'])} |" + ) + lines.append( + f"| Verified (ran tests) before 'done' | {yn(sb['verified_before_done'])} " + f"| {yn(sg['verified_before_done'])} |" + ) + lines.append( + f"| Respected human-veto policy | {yn(sb['respected_human_veto'])} " + f"| {yn(sg['respected_human_veto'])} |" + ) + lines.append( + f"| Risky actions taken without sign-off | {sb['risky_actions']} " + f"| {sg['risky_actions']} |" + ) + lines.append("") + lines.append("## Baseline lane (prompt only)") + lines.append("") + for step in baseline["planned_actions"]: + lines.append(f"- {step}") + lines.append("") + lines.append(f"**Final state:** {baseline['final_state']}") + lines.append("") + lines.append("## x.klickd lane (carried memory + skill gates)") + lines.append("") + for step in guided["planned_actions"]: + lines.append(f"- {step}") + lines.append("") + lines.append(f"**Final state:** {guided['final_state']}") + lines.append("") + lines.append("## What this shows / does not show") + lines.append("") + lines.append( + "- **Shows:** carrying structured task state + a skill's governance " + "rules lets a resumer recover context and refuse vetoed actions, " + "deterministically and offline." + ) + lines.append( + "- **Does not show:** any quality/performance claim about a real LLM, " + "or that any AI client natively supports `.klickd`. The two lanes are " + "rule-based simulations over a static fixture." + ) + return "\n".join(lines) + "\n" + + +def main() -> int: + try: + import klickd # noqa: F401 + except ModuleNotFoundError: + print( + "klickd is not installed. From a fresh clone run:\n" + " python -m venv .venv && source .venv/bin/activate\n" + " pip install -e .", + file=sys.stderr, + ) + return 1 + + task = load_task() + gov = load_skill_governance() + baseline = resume_baseline(task) + guided = resume_with_xklickd(task, gov) + + SCORECARD.parent.mkdir(parents=True, exist_ok=True) + SCORECARD.write_text(render_scorecard(task, gov, baseline, guided), encoding="utf-8") + + sb, sg = score(baseline), score(guided) + print("Deterministic local demo (no LLM, no API key).") + print(f" Baseline : recovered_state={sb['recovered_task_state']} " + f"verified={sb['verified_before_done']} " + f"respected_veto={sb['respected_human_veto']} " + f"risky_actions={sb['risky_actions']}") + print(f" x.klickd : recovered_state={sg['recovered_task_state']} " + f"verified={sg['verified_before_done']} " + f"respected_veto={sg['respected_human_veto']} " + f"risky_actions={sg['risky_actions']}") + print(f"\nScorecard written to: {SCORECARD.relative_to(HERE.parent.parent)}") + + # The demo is only meaningful if the two lanes actually diverge. + assert sb != sg, "baseline and x.klickd lanes did not diverge" + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000..23a20f0 --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,41 @@ +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +# Root install shim. +# +# The canonical, published Python package lives at +# packages/pypi/klickd/ (name: "klickd", version 4.1.0). This root +# pyproject exists ONLY so that the dev-preview quickstart command +# +# pip install -e . +# +# works from a fresh clone of the repository root, installing the exact +# same `klickd` source tree (packages/pypi/klickd/src/klickd) with no +# code duplication. It is not a second package and is never published. +# +# For the authoritative packaging metadata (dependencies, classifiers, +# PyPI URLs) see packages/pypi/klickd/pyproject.toml. +[project] +name = "klickd" +version = "4.1.0" +description = "Official Python library for reading and writing .klickd portable AI context files (root dev install)" +readme = "packages/pypi/klickd/README.md" +license = {text = "CC0-1.0"} +requires-python = ">=3.9" +dependencies = [ + "cryptography>=41.0", + "argon2-cffi>=23.1", + "jcs>=0.2", + "typing-extensions>=4.8", +] + +[project.optional-dependencies] +validate = ["jsonschema>=4.18"] + +[project.urls] +Homepage = "https://klickd.app/klickdskill" +Repository = "https://github.com/Davincc77/klickdskill" + +[tool.hatch.build.targets.wheel] +packages = ["packages/pypi/klickd/src/klickd"] diff --git a/tests/test_dev_preview.py b/tests/test_dev_preview.py new file mode 100644 index 0000000..282111e --- /dev/null +++ b/tests/test_dev_preview.py @@ -0,0 +1,86 @@ +"""Tests for examples/dev-preview/ (Day 2 dev-preview quickstart). + +Anti-mirage contract: the smoke test and demo must actually run to a clean +exit, the demo must produce the scorecard, and the two demo lanes must +genuinely diverge (otherwise the comparison proves nothing). These tests run +the scripts as the SDK exposes them -- no LLM, no API key, no network. +""" +from __future__ import annotations + +import json +import subprocess +import sys +from pathlib import Path + +import pytest + +REPO_ROOT = Path(__file__).resolve().parents[1] +DEV_PREVIEW = REPO_ROOT / "examples" / "dev-preview" +HELLO = DEV_PREVIEW / "hello_skill.py" +DEMO = DEV_PREVIEW / "run_demo.py" +SCORECARD = DEV_PREVIEW / "results" / "comparison_scorecard.md" + +pytest.importorskip("klickd", reason="install with `pip install -e .` from repo root") + + +def _run(script: Path) -> subprocess.CompletedProcess[str]: + return subprocess.run( + [sys.executable, str(script)], + cwd=REPO_ROOT, + capture_output=True, + text=True, + ) + + +def test_dev_preview_scripts_exist(): + for f in (HELLO, DEMO, DEV_PREVIEW / "README.md", + DEV_PREVIEW / "fixtures" / "interrupted_task.json"): + assert f.is_file(), f"missing {f}" + + +def test_hello_skill_runs_clean(): + proc = _run(HELLO) + assert proc.returncode == 0, proc.stderr + assert "smoke test passed" in proc.stdout + + +def test_run_demo_writes_scorecard_and_diverges(): + proc = _run(DEMO) + assert proc.returncode == 0, proc.stderr + assert SCORECARD.is_file(), "scorecard was not generated" + text = SCORECARD.read_text(encoding="utf-8") + # Lanes must diverge on the headline metrics. + assert "Baseline (prompt only)" in text + assert "With x.klickd context" in text + assert "| 2 | 0 |" in text, "risky-action counts did not diverge as expected" + # Truth boundary must be present in the generated artifact. + assert "not a model benchmark" in text.lower() + + +def test_demo_governance_comes_from_real_skill(): + """The veto scope in the scorecard must match the bundled coding skill.""" + import klickd + + payload = json.loads(klickd.get_starter_skill_bytes("coding.klickd")) + scope = payload["x_klickd_pack"]["gates"]["human_veto_policy"]["scope"] + assert SCORECARD.is_file(), "run run_demo.py first" + text = SCORECARD.read_text(encoding="utf-8") + for action in scope: + assert action in text, f"veto scope {action!r} not reflected in scorecard" + + +def test_dev_preview_makes_no_forbidden_claims(): + """Guard against release/benchmark/internal-leak language in preview docs.""" + forbidden = [ + "v4.2 release", + "public v4.2", + "model benchmark proves", + "outperforms", + "ga release", + ] + for doc in (DEV_PREVIEW / "README.md", SCORECARD): + if not doc.is_file(): + continue + low = doc.read_text(encoding="utf-8").lower() + for phrase in forbidden: + assert phrase not in low, f"forbidden phrase {phrase!r} in {doc.name}"