Stop Claude Code from hallucinating — Karpathy-grade discipline in 8 skills.
Read in Russian / Читать на русском →
LLMs ship competently-wrong code. They invent APIs that don't exist, build the wrong feature confidently, declare partial work done, and leave you to find out in production. This repo is a behavioral firewall.
Eight skills + two execution hooks, derived from Andrej Karpathy's observations on LLM coding pitfalls and from real failure cases. Drop into ~/.claude/skills/, get back a Claude Code that thinks before it codes, verifies before it claims, and tells the truth at the end of the session.
| Skill | What it stops |
|---|---|
| think-before-coding | Silent assumption-picking when ≥2 interpretations exist |
| simplicity-first | 200-line solutions to 50-line problems |
| surgical-edits | Drift, reformatting, "while I was here" creep |
| verifiable-criteria | "Should work now" without a concrete check |
| hostile-self-review | Declaring done after building the wrong thing |
| anti-hallucination | Inventing APIs, paths, line numbers, library functions |
| memory-hygiene | MEMORY.md silently truncating half your rules |
| session-closer | Sugarcoated end-of-session reports that hide blockers |
Plus two hooks that move discipline from suggestion to execution:
pre-commit-honest-review.sh— blocksgit commituntil you answer 6 self-review questionssession-end-truth.sh— auto-extracts TODOs / blockers / errors from the session transcript into your sessions log
The original andrej-karpathy-skills repo is one file, 67 lines, 4 principles. Most of those principles are already in Claude Code's default system prompt. It went viral on dev Twitter / Telegram anyway, because the framing is good.
This repo is what that repo would be if it actually pulled its weight:
- 8 skills instead of 1
- Each skill addresses a specific failure mode with concrete triggers, anti-patterns, and examples
- 2 execution hooks for the rules that benefit from being enforced, not just stated
- An honest benchmark that tells you exactly which skills add real value vs. which are reinforcement of defaults
We are not claiming this turns Claude into Karpathy. Marketing copies of the original do. We're claiming it adds the missing 30% — verifiable success criteria, hostile self-review, anti-hallucination, memory hygiene, honest session reports — that Claude Code defaults do not enforce.
git clone https://github.com/fbsmna-coder/karpathy-pro-max.git
cp -r karpathy-pro-max/skills/* ~/.claude/skills/That's it. Skills auto-load when relevant. To install hooks per-project:
# pre-commit honest review
ln -s /absolute/path/to/karpathy-pro-max/hooks/pre-commit-honest-review.sh \
.git/hooks/pre-commit
# session-end auto-extraction (in ~/.claude/settings.json)
{
"hooks": {
"SessionEnd": [
{"command": "/absolute/path/to/karpathy-pro-max/hooks/session-end-truth.sh"}
]
}
}Without Karpathy Pro Max:
User: Add caching to the user lookup. Agent: writes Redis client + cache-aside pattern + 12 tests + Dockerfile change. User: This is a CLI tool. There is no Redis. I meant
@lru_cache.
With Karpathy Pro Max (think-before-coding fires):
User: Add caching to the user lookup. Agent: "Caching" can mean a few things. Quick check of the codebase: single-process CLI, no Redis, SQLite only. My default:
functools.lru_cache. Confirm or pick: (1) lru_cache, (2) Redis, (3) SQLite-backed table. User: lru_cache, maxsize 1024. Agent: 4 lines. Done in 30 seconds.
Full version of this and other examples in examples/.
See CLAUDE.md for the master rules.
- Think before coding — surface assumptions, list interpretations, ask once
- Simplicity first — minimum viable code, 50 over 200
- Surgical edits — every line traces to the request
- Verifiable criteria — define success as a concrete artifact
- Hostile self-review — read your diff as a skeptic before declaring done
- Anti-hallucination — verify before claiming, cite
file:line, fetch fresh docs - Memory hygiene — keep CLAUDE.md / MEMORY.md inside its load window
- Session closer — honest status: shipped, not shipped, blocked, unclear
- You ship side projects in 30 minutes. This biases toward caution. Use judgment.
- You write greenfield code with no shared context. Some skills (memory-hygiene, surgical-edits) shine in larger codebases.
- You already have a customized
CLAUDE.md. Read benchmarks/against-defaults.md and install only what adds value — don't paste duplicates.
Concrete failure cases beat abstract principles. If you have a real session transcript where Claude Code shipped wrong code, open an issue with:
- The original request
- What Claude did
- What Claude should have done
- Which existing skill would have caught it (or which new skill is needed)
We add skills slowly. Eight is already at the edge of where adding more dilutes value. New skill additions require demonstrable failure modes not covered by the existing eight.
Maintained by @fbsmna-coder. Issues and PRs welcome.
MIT. See LICENSE.
- Behavioral observations: Andrej Karpathy
- Original skill format: forrestchang/andrej-karpathy-skills
- This repo: failure cases, hooks, expanded skills, honest benchmarks