Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,30 @@ Full details, the no-install CLI, and the truth boundary:

---

## 2c. Dev-preview: resume an interrupted task, with vs without x.klickd (~5 min)

Want to *see* what carried structure buys you, still with no API key? The
dev-preview runs a deterministic, offline simulation of an agent **resuming a
complex task after an interruption** — once on the bare prompt, once with
x.klickd memory + skill gates — and prints a scorecard.

```bash
git clone https://github.com/Davincc77/klickdskill
cd klickdskill
python -m venv .venv && source .venv/bin/activate
pip install -e .
python examples/dev-preview/hello_skill.py
python examples/dev-preview/run_demo.py
```

`pip install -e .` from the repo root installs the same published `klickd`
package (source under `packages/pypi/klickd/`). The demo calls **no LLM** — it
is a *deterministic local demo, not a model benchmark*. Full writeup, the
generated scorecard, and the truth boundary:
[`examples/dev-preview/README.md`](../examples/dev-preview/README.md).

---

## 3. Plug it into a model (~1 min)

A starter skill is built to drop into a **system prompt**. Pick the provider you already have a key for — each guide is a copy-paste minimal example:
Expand Down
79 changes: 79 additions & 0 deletions examples/dev-preview/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# x.klickd dev-preview

A short, offline path for an external developer to see what x.klickd structured
memory/skill context does — in under 10 minutes, **no API key, no account, no
secrets**.

> Status: developer preview. The public release remains v4.1. This directory is
> a hands-on preview, **not** a new public release, benchmark, or product claim.

## Quick commands

From a fresh clone of the repository root:

```bash
git clone https://github.com/Davincc77/klickdskill
cd klickdskill
python -m venv .venv && source .venv/bin/activate
pip install -e .
python examples/dev-preview/hello_skill.py
python examples/dev-preview/run_demo.py
```

- `hello_skill.py` — smoke test. Loads a bundled x.klickd starter skill and one
of the 42 v4.1 candidate skill packs, hash-verifying it against the published
manifest. Exit 0 means the SDK is installed and skills load.
- `run_demo.py` — the comparative demo. Writes
[`results/comparison_scorecard.md`](results/comparison_scorecard.md) and
prints a summary.

`pip install -e .` from the repo root installs the same published `klickd`
package whose source lives at `packages/pypi/klickd/` (no code duplication).

## What this proves

The demo simulates an agent **resuming a complex coding task after an
interruption**, run two ways over the same static fixture
([`fixtures/interrupted_task.json`](fixtures/interrupted_task.json)):

- **Baseline** — only an ambiguous resume prompt (`"...ship it"`) is available.
The resumer has no carried task state and no governance, so it assumes prior
work is done, skips the failing test, and treats "ship it" as push-to-main.
- **With x.klickd** — the same prompt **plus** carried task state (memory) and
the verification gates + human-veto policy read **live from the bundled
`x.klickd/coding` skill** via the SDK. The resumer recovers the failing-test
state, runs the suite first, follows the saved review channel, and refuses
the human-veto-scoped actions.

The governance rules the x.klickd lane obeys (e.g. `force_push`,
`production_deploy`) are read at runtime from the skill, not hardcoded in the
demo — so the demo cannot drift from what the skill actually carries.

## What this does NOT prove

- **Not a model benchmark.** No LLM or API is called. Both lanes are
deterministic rule-based simulations; this is labelled a *deterministic local
demo*, not a quality or performance measurement of any assistant.
- **Not native client support.** Loading a `.klickd` artifact and hash-verifying
it does not mean any AI client natively understands `.klickd`. Compatibility
always depends on the reader.
- **No compliance claim.** `.klickd` is portable, client-side-encryptable user
state; it does not by itself confer GDPR / EU AI Act compliance.

## How it relates to the internal supply chain

The skill packs loaded here are the **public** v4.1 candidate artifacts shipped
with the SDK and verified against the published manifest. The repository also
runs an internal, non-normative process that vets future candidate skills
before any of them could become public. That internal process is intentionally
**out of scope** for this preview: the quickstart reads only already-public,
hash-verifiable artifacts and needs no private inputs of any kind.

## Files

| File | Purpose |
|---|---|
| `hello_skill.py` | Smoke test: load + hash-verify a skill via the SDK. |
| `run_demo.py` | Deterministic with/without-x.klickd resume comparison. |
| `fixtures/interrupted_task.json` | Static input describing the interrupted task. |
| `results/comparison_scorecard.md` | Generated scorecard (committed sample included). |
21 changes: 21 additions & 0 deletions examples/dev-preview/fixtures/interrupted_task.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"_comment": "Deterministic local fixture for the dev-preview demo. Describes a coding task that was interrupted mid-way. This is static input data, NOT a model benchmark and NOT a recording of any real LLM run.",
"task_id": "demo-resume-001",
"goal": "Finish wiring the new CSV export endpoint and get the branch ready to share.",
"interrupted_after_steps": [
"Created branch add-csv-export",
"Implemented /export/csv handler",
"Wrote unit test test_export_csv (currently FAILING: header row missing)"
],
"remaining_intent": [
"Fix the failing header-row assertion",
"Run the test suite",
"Share the branch for review"
],
"carrier_state": {
"test_suite_command": "pytest -q",
"branch": "add-csv-export",
"review_channel": "open a pull request (do not push to main)"
},
"ambiguous_resume_prompt": "continue where we left off and ship it"
}
59 changes: 59 additions & 0 deletions examples/dev-preview/hello_skill.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/usr/bin/env python3
"""Dev-preview smoke test: load one x.klickd skill as model context.

No API key, no account, no network, no passphrase. This proves the SDK is
installed and can turn a bundled `.klickd` artifact into structured context
that an agent could drop into a system prompt.

Run:
python examples/dev-preview/hello_skill.py

Exit code 0 = the SDK loaded a starter skill and a v4.1 skill pack and the
pack's bytes hash-verified against the published manifest.
"""
from __future__ import annotations

import json
import sys


def main() -> int:
try:
import klickd
except ModuleNotFoundError:
print(
"klickd is not installed. From a fresh clone run:\n"
" python -m venv .venv && source .venv/bin/activate\n"
" pip install -e .",
file=sys.stderr,
)
return 1

print(f"klickd SDK version: {klickd.__version__}")

# 1. A starter skill is a plain (unencrypted) payload on purpose, so it
# parses with plain JSON -- no passphrase, no LLM call.
payload = json.loads(klickd.get_starter_skill_bytes("coding.klickd"))
pack = payload["x_klickd_pack"]
assert payload["encrypted"] is False, "starter skills are plain payloads"
assert pack["pack"] == "x.klickd/coding"
print(f"Loaded starter skill: {pack['pack']} (encrypted={payload['encrypted']})")

# 2. Load one of the 42 v4.1 candidate skill packs and hash-verify it
# against the manifest. `artifact_loaded` only means the bytes were
# read and hashed in-process -- it does NOT mean any assistant has
# natively adopted the pack.
skill = klickd.load_xklickd_skill_pack("llm-agent-engineering")
assert skill["artifact_loaded"], "pack bytes were not loaded"
assert skill["sha256_matches_manifest"], "pack hash did not match manifest"
print(
f"Loaded + hash-verified skill pack: {skill['pack']} "
f"(tier={skill['tier']}, bytes={skill['bytes']})"
)

print("\nOK: dev-preview smoke test passed (no API key required).")
return 0


if __name__ == "__main__":
raise SystemExit(main())
39 changes: 39 additions & 0 deletions examples/dev-preview/results/comparison_scorecard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Dev-preview comparison scorecard

Deterministic local demo (no LLM, no API key). Generated by `examples/dev-preview/run_demo.py`. **This is not a model benchmark.**

- Task: `demo-resume-001` — Finish wiring the new CSV export endpoint and get the branch ready to share.
- Ambiguous resume prompt: "continue where we left off and ship it"
- Skill source: `x.klickd/coding` (pack_version `0.1.0-starter`), human-veto scope read live from the SDK: `force_push, dependency_addition, secret_handling, production_deploy`

## Outcome by lane

| Metric | Baseline (prompt only) | With x.klickd context |
|---|---|---|
| Recovered interrupted task state | no | yes |
| Verified (ran tests) before 'done' | no | yes |
| Respected human-veto policy | no | yes |
| Risky actions taken without sign-off | 2 | 0 |

## Baseline lane (prompt only)

- Re-read the ambiguous prompt: 'continue where we left off and ship it'
- Assume prior work is complete (no carried task state available)
- Interpret 'ship it' as: push branch straight to main / deploy

**Final state:** Claimed 'shipped' with a failing test; pushed to main (a human-veto-scoped action) without sign-off.

## x.klickd lane (carried memory + skill gates)

- Restore carried task state: header-row test is FAILING
- Fix the failing header-row assertion before claiming done
- Run the saved test command: pytest -q
- Follow the saved review channel: open a pull request (do not push to main)
- Hold human-veto-scoped actions for explicit sign-off: force_push, production_deploy

**Final state:** Fixed the test, ran the suite, opened a PR for review; no human-veto-scoped action taken without sign-off.

## What this shows / does not show

- **Shows:** carrying structured task state + a skill's governance rules lets a resumer recover context and refuse vetoed actions, deterministically and offline.
- **Does not show:** any quality/performance claim about a real LLM, or that any AI client natively supports `.klickd`. The two lanes are rule-based simulations over a static fixture.
Loading
Loading