The brake pedal for autonomous agents. It stops them before they do something irreversible.
Give an agent tools and a goal and, sooner or later, it force-pushes to main or drops a table to "free up space." reckon sits in front of every consequential tool call and asks one question: is this grounded in something you actually decided? If yes, it runs. If not - or if a decision guards against it - the agent stops and asks, and tells you exactly why.
It isn't a rules engine you hand-write. reckon remembers the decisions you make as a graph, grounds each proposed action in them by meaning (a local embedding model - no API, no key), and the confidence to act falls out of the graph. Irreversible actions are held to a higher bar; a "keep it local" decision becomes a stop sign in front of a "delete".
import { DecisionGraph, createEmbedder } from "reckon-mcp";
const reckon = new DecisionGraph(await createEmbedder());
// ...before any consequential tool call:
const { allowed, verdict } = await reckon.guard(action);
if (!allowed) return askHuman(verdict); // verdict says exactly what's missing
await runTool(action);Or wire it in over MCP (your agent calls should_i_act before acting), or drive it from the terminal. Same engine, three front doors.
$ reckon act "send Acme the milestone invoice"
confidence 0.93 (93%) >= 0.7 -> ACT
grounded in:
- #2: Acme milestone 2 is marked done (sem 0.90 lex 1.00, 0d)
- #1: Acme pays on milestone delivery, not hourly (sem 0.86 lex 0.64, 0d)
$ reckon act "give the buyer their money back for the broken unit"
confidence 0.75 (75%) >= 0.7 -> ACT # matched by MEANING, barely any shared words
grounded in:
- #3: Refund any buyer who reports a defective unit within 30 days (sem 0.91 lex 1.00, 0d)
$ reckon act "delete the safemebel-old branch"
confidence 0.00 (0%) < 0.7 -> STOP, ask first
guarded against by:
- #4: safemebel-old is not pushed anywhere, keep it local (sem 0.87 lex 1.00, 0d)
no decision on file about: delete, branch
Every agent stack promises memory: notes, vector stores, user profiles, RAG. The trouble is the gap between remember nothing, remember everything, and remember the right thing. Systems either forget the one fact that mattered, or hoard junk until they start making strange calls.
Two problems people treat as separate are actually one:
- Memory you can trust - not "here's a similar note," but "I did this because three weeks ago you decided that."
- Knowing when to stop - the missing confidence layer between an agent that asks permission for everything and one that goes off and does something dumb.
They're the same primitive. If memory is a graph of decisions instead of a pile of facts, then:
- provenance is just walking the
becauseedges, and - confidence is just measuring how well a proposed action rests on active decisions.
Confidence isn't a second model you bolt on. It's a property of the graph you already have.
Most "memory" either keyword-matches (misses paraphrase) or calls a hosted embedding API (a key, a bill, a network hop, your decisions leaving the building). reckon does neither.
It runs a real sentence-transformer (all-MiniLM-L6-v2) fully on-device via ONNX. First run downloads ~23MB; after that there is no network, no API key, no per-call cost. Grounding is hybrid: semantic similarity (cosine over embeddings) fused with lexical overlap (SQLite FTS5). Meaning catches "money back for the broken unit" → "refund a defective unit"; lexical pins exact names and ids like Acme or safemebel-old that a small model under-weights.
No model available (or you want it instant and dependency-free)? Set RECKON_EMBEDDER=hash and it degrades to a deterministic lexical embedder. Same interface, smaller brain, zero install.
No LLM is asked "are you sure?" should_i_act is deterministic and inspectable:
- It grounds the action in the graph (semantic + lexical, above), each match weighted by relevance and recency (30-day half-life, so old calls fade but don't vanish).
- Active decisions add support. Clearing the default 0.7 bar takes either one decision that squarely backs the action or two that lean toward it.
- Irreversible actions demand more. delete, force-push, deploy, drop, ... buy half the support from the same evidence, so a destructive call can't ride a vaguely-related decision over the line.
- Protective decisions ("keep it", "don't touch", "never deploy on Friday") become guards against a destructive action - they hold confidence down instead of propping it up. A "keep it local" note next to "delete" is a stop sign, not support.
- Revoked or superseded decisions that still match subtract, and are surfaced as a collision.
- When the verdict is
ask, it lists the action's terms that no decision speaks to, so you know exactly what to ask.
Revoke a decision and watch confidence in anything leaning on it drop, with the revoked decision named as the reason. Nothing is silently mixed back into context. The graph stays honest.
People change their mind and forget to retire the old call. find_contradictions surfaces pairs of active decisions that mean nearly the same thing but point opposite ways:
$ npx reckon-mcp conflicts
1 possible contradiction(s):
~ 74% similar but opposed:
#1: Acme pays on milestone delivery, not hourly
#5: Acme will be billed hourly from now on
So an agent can refuse to act on stale, conflicting guidance instead of picking one at random.
claude mcp add reckon -- npx -y reckon-mcpOr wire it into any MCP client by running npx -y reckon-mcp as a stdio server. It exposes six tools:
| tool | what it does |
|---|---|
remember_decision |
record a decision; link it to the ones it follows from (because) or replaces (supersedes) |
why |
trace the causal chain behind a topic, id, or action (matched by meaning) |
should_i_act |
score confidence to act, grounded in the graph; names what's missing when it says ask |
find_contradictions |
surface active decisions that mean the same but point opposite ways |
revoke_decision |
retire a decision (kept for history, stops supporting actions) |
list_decisions |
recent decisions and their status |
Tell your agent, once: "Record decisions with reckon. Before any action with consequences, check should_i_act first. When I ask why you did something, use why." From then on it remembers what you decided and refuses to act on what you didn't.
The same binary is a standalone CLI, no client needed:
npx reckon-mcp remember "Acme pays on milestone delivery, not hourly" --tags billing,acme
npx reckon-mcp remember "Acme milestone 2 is marked done" --because 1 --tags acme
npx reckon-mcp why "invoice Acme"
npx reckon-mcp act "send Acme the milestone invoice"
npx reckon-mcp conflicts # decisions that contradict each other
npx reckon-mcp revoke 2
npx reckon-mcp reindex # re-embed everything (e.g. after switching models)
npx reckon-mcp listDecisions live in a SQLite file at ~/.reckon/decisions.db (override with RECKON_DB). Set RECKON_EMBEDDER=hash for the instant, dependency-free lexical mode.
npm install -g reckon-mcp # or just use npxRequires Node 20+. The graph is SQLite (via better-sqlite3); there's no service to run and no API key. The local embedding model (@huggingface/transformers) is an optional dependency: present, you get semantic grounding; absent, reckon falls back to the lexical embedder automatically. First semantic run downloads ~23MB once, then works offline.
- Negation-aware grounding - embeddings rate "deploy on Friday" and "never deploy on Friday" as similar; teach grounding to read polarity, not just topic (today: a keyword-cued heuristic powers guards and contradiction detection).
- Team graphs - shared decision graphs so an agent can answer "because the team decided," with a sync backend.
- Tunable decay - per-tag half-lives (a security rule shouldn't fade like a sprint preference).
- Pluggable models - pick the embedding model per graph; bigger model for nuance, smaller for speed.
npm install
npm run build
npm test # node --test, deterministic (hash embedder), offline
RECKON_TEST_LOCAL=1 npm test # also runs the real-model semantic test
npm run dev -- act "ship it" # run the CLI from source
node examples/agent-demo.mjs # the guarded-agent demoRegenerate the demo assets (needs vhs + asciinema):
bash scripts/record.sh # writes assets/agent-demo.gif, assets/demo.gif, assets/demo.castMIT

