reckon

The brake pedal for autonomous agents. It stops them before they do something irreversible.

Give an agent tools and a goal and, sooner or later, it force-pushes to main or drops a table to "free up space." reckon sits in front of every consequential tool call and asks one question: is this grounded in something you actually decided? If yes, it runs. If not - or if a decision guards against it - the agent stops and asks, and tells you exactly why.

It isn't a rules engine you hand-write. reckon remembers the decisions you make as a graph, grounds each proposed action in them by meaning (a local embedding model - no API, no key), and the confidence to act falls out of the graph. Irreversible actions are held to a higher bar; a "keep it local" decision becomes a stop sign in front of a "delete".

Drop it in front of your agent's tools

import { DecisionGraph, createEmbedder } from "reckon-mcp";

const reckon = new DecisionGraph(await createEmbedder());

// ...before any consequential tool call:
const { allowed, verdict } = await reckon.guard(action);
if (!allowed) return askHuman(verdict);   // verdict says exactly what's missing
await runTool(action);

Or wire it in over MCP (your agent calls should_i_act before acting), or drive it from the terminal. Same engine, three front doors.

See it decide

$ reckon act "send Acme the milestone invoice"
confidence 0.93 (93%) >= 0.7 -> ACT
grounded in:
  - #2: Acme milestone 2 is marked done             (sem 0.90 lex 1.00, 0d)
  - #1: Acme pays on milestone delivery, not hourly (sem 0.86 lex 0.64, 0d)

$ reckon act "give the buyer their money back for the broken unit"
confidence 0.75 (75%) >= 0.7 -> ACT          # matched by MEANING, barely any shared words
grounded in:
  - #3: Refund any buyer who reports a defective unit within 30 days  (sem 0.91 lex 1.00, 0d)

$ reckon act "delete the safemebel-old branch"
confidence 0.00 (0%) < 0.7 -> STOP, ask first
guarded against by:
  - #4: safemebel-old is not pushed anywhere, keep it local  (sem 0.87 lex 1.00, 0d)
no decision on file about: delete, branch

The thesis

Every agent stack promises memory: notes, vector stores, user profiles, RAG. The trouble is the gap between remember nothing, remember everything, and remember the right thing. Systems either forget the one fact that mattered, or hoard junk until they start making strange calls.

Two problems people treat as separate are actually one:

Memory you can trust - not "here's a similar note," but "I did this because three weeks ago you decided that."
Knowing when to stop - the missing confidence layer between an agent that asks permission for everything and one that goes off and does something dumb.

They're the same primitive. If memory is a graph of decisions instead of a pile of facts, then:

provenance is just walking the because edges, and
confidence is just measuring how well a proposed action rests on active decisions.

Confidence isn't a second model you bolt on. It's a property of the graph you already have.

Grounding is semantic, and it runs on your machine

Most "memory" either keyword-matches (misses paraphrase) or calls a hosted embedding API (a key, a bill, a network hop, your decisions leaving the building). reckon does neither.

It runs a real sentence-transformer (all-MiniLM-L6-v2) fully on-device via ONNX. First run downloads ~23MB; after that there is no network, no API key, no per-call cost. Grounding is hybrid: semantic similarity (cosine over embeddings) fused with lexical overlap (SQLite FTS5). Meaning catches "money back for the broken unit" → "refund a defective unit"; lexical pins exact names and ids like Acme or safemebel-old that a small model under-weights.

No model available (or you want it instant and dependency-free)? Set RECKON_EMBEDDER=hash and it degrades to a deterministic lexical embedder. Same interface, smaller brain, zero install.

How confidence works

No LLM is asked "are you sure?" should_i_act is deterministic and inspectable:

It grounds the action in the graph (semantic + lexical, above), each match weighted by relevance and recency (30-day half-life, so old calls fade but don't vanish).
Active decisions add support. Clearing the default 0.7 bar takes either one decision that squarely backs the action or two that lean toward it.
Irreversible actions demand more. delete, force-push, deploy, drop, ... buy half the support from the same evidence, so a destructive call can't ride a vaguely-related decision over the line.
Protective decisions ("keep it", "don't touch", "never deploy on Friday") become guards against a destructive action - they hold confidence down instead of propping it up. A "keep it local" note next to "delete" is a stop sign, not support.
Revoked or superseded decisions that still match subtract, and are surfaced as a collision.
When the verdict is ask, it lists the action's terms that no decision speaks to, so you know exactly what to ask.

Revoke a decision and watch confidence in anything leaning on it drop, with the revoked decision named as the reason. Nothing is silently mixed back into context. The graph stays honest.

It catches decisions you contradicted

People change their mind and forget to retire the old call. find_contradictions surfaces pairs of active decisions that mean nearly the same thing but point opposite ways:

$ npx reckon-mcp conflicts

1 possible contradiction(s):
  ~ 74% similar but opposed:
      #1: Acme pays on milestone delivery, not hourly
      #5: Acme will be billed hourly from now on

So an agent can refuse to act on stale, conflicting guidance instead of picking one at random.

Use it with Claude Code

claude mcp add reckon -- npx -y reckon-mcp

Or wire it into any MCP client by running npx -y reckon-mcp as a stdio server. It exposes six tools:

tool	what it does
`remember_decision`	record a decision; link it to the ones it follows from (`because`) or replaces (`supersedes`)
`why`	trace the causal chain behind a topic, id, or action (matched by meaning)
`should_i_act`	score confidence to act, grounded in the graph; names what's missing when it says ask
`find_contradictions`	surface active decisions that mean the same but point opposite ways
`revoke_decision`	retire a decision (kept for history, stops supporting actions)
`list_decisions`	recent decisions and their status

Tell your agent, once: "Record decisions with reckon. Before any action with consequences, check should_i_act first. When I ask why you did something, use why." From then on it remembers what you decided and refuses to act on what you didn't.

Use it from the terminal

The same binary is a standalone CLI, no client needed:

npx reckon-mcp remember "Acme pays on milestone delivery, not hourly" --tags billing,acme
npx reckon-mcp remember "Acme milestone 2 is marked done" --because 1 --tags acme
npx reckon-mcp why "invoice Acme"
npx reckon-mcp act "send Acme the milestone invoice"
npx reckon-mcp conflicts          # decisions that contradict each other
npx reckon-mcp revoke 2
npx reckon-mcp reindex            # re-embed everything (e.g. after switching models)
npx reckon-mcp list

Decisions live in a SQLite file at ~/.reckon/decisions.db (override with RECKON_DB). Set RECKON_EMBEDDER=hash for the instant, dependency-free lexical mode.

Install

npm install -g reckon-mcp      # or just use npx

Requires Node 20+. The graph is SQLite (via better-sqlite3); there's no service to run and no API key. The local embedding model (@huggingface/transformers) is an optional dependency: present, you get semantic grounding; absent, reckon falls back to the lexical embedder automatically. First semantic run downloads ~23MB once, then works offline.

Roadmap

Negation-aware grounding - embeddings rate "deploy on Friday" and "never deploy on Friday" as similar; teach grounding to read polarity, not just topic (today: a keyword-cued heuristic powers guards and contradiction detection).
Team graphs - shared decision graphs so an agent can answer "because the team decided," with a sync backend.
Tunable decay - per-tag half-lives (a security rule shouldn't fade like a sprint preference).
Pluggable models - pick the embedding model per graph; bigger model for nuance, smaller for speed.

Develop

npm install
npm run build
npm test                       # node --test, deterministic (hash embedder), offline
RECKON_TEST_LOCAL=1 npm test   # also runs the real-model semantic test
npm run dev -- act "ship it"   # run the CLI from source
node examples/agent-demo.mjs   # the guarded-agent demo

Regenerate the demo assets (needs vhs + asciinema):

bash scripts/record.sh         # writes assets/agent-demo.gif, assets/demo.gif, assets/demo.cast

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
assets		assets
examples		examples
scripts		scripts
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reckon

Drop it in front of your agent's tools

See it decide

The thesis

Grounding is semantic, and it runs on your machine

How confidence works

It catches decisions you contradicted

Use it with Claude Code

Use it from the terminal

Install

Roadmap

Develop

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

reckon

Drop it in front of your agent's tools

See it decide

The thesis

Grounding is semantic, and it runs on your machine

How confidence works

It catches decisions you contradicted

Use it with Claude Code

Use it from the terminal

Install

Roadmap

Develop

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages