Autonomous reverse-engineering agent. Drives HexCore's emulation engine mid-execution, making live decisions that bypass anti-analysis, resolve hashed APIs, and extract real behavior from hostile samples.
Project Pythia is an Anthropic hackathon project (Apr 21-26, 2026). It combines two things that have never been glued together before:
- Issue #17 Oracle Hook — a pause-and-respond callback system originally proposed by @YasminePayload to let external agents intervene during headless emulation.
- Claude Agent SDK — Anthropic's framework for building tool-wielding autonomous agents.
Pythia is the bridge. She registers as an Oracle listener on a HexCore emulation session, receives Decision Requests at every hook trigger, inspects the machine state with her tool set, and issues Decisions that HexCore applies before resuming execution.
Pythia is built on top of HexCore, an MIT + Apache 2.0 licensed binary analysis IDE maintained by @LXrdKnowkill. HexCore provides the execution infrastructure — decompilation pipeline, pattern matching engine (HQL), CPU emulation (Unicorn), static analysis (Capstone / LLVM-MC), session persistence — that Pythia orchestrates.
Project-Pythia(this repo) — 100 % new code, started April 21, 2026.- Oracle Hook — new contribution to HexCore on branch
feature/oracle-hook-hackathon. All commits dated within the hackathon window.
HexCore's core infrastructure (decompiler, HQL engine, automation pipeline, Unicorn emulation integration, SharedArrayBuffer zero-copy IPC channel — shipped as Project Perseus in HexCore v3.8.0) existed before the hackathon and is used as an open-source dependency. The same way any project uses Node.js, LLVM, or Capstone.
Every modern dynamic analysis sandbox has the same failure mode: hostile samples use timing checks, PEB reads, environment fingerprinting, and API hash resolution to detect the sandbox and abort silently. The analyst sees "no malicious behavior observed" — and the sample walks.
The classical defenses are:
- Hardcoded bypasses — one patch per known anti-debug variant. Doesn't scale.
- Manual reverse engineering — works, but costs an analyst hours per sample.
- Post-mortem re-emulation with patched input — slow, doesn't handle non-determinism.
Pythia is a fourth path: let a language model make the decisions, in real time, during emulation. Claude understands the context (registers, disassembly, pseudo-C, call stack) that a static patch list never could. When a new anti-debug variant appears, Pythia decides what to do on the fly, no engine update required.
The project was validated against two purpose-built red-team samples authored by the HexCore team. Each sample opens https://github.com/AkashaCorporation in the default browser via ShellExecuteW — but the URL is the last thing it does, after a chain of anti-analysis checks that must all pass.
Zero API plaintexts (FNV-1a hashed), XOR-split FNV constants, arithmetic-derived PEB offsets, KUSER shared-data timing, removed IsDebuggerPresent, opaque predicates on every anti-debug branch, no .rdata debug strings, XOR-PRNG URL decoded on stack, opaque function labels. Compiled with /NODEFAULTLIB /ENTRY:hc_entry — no CRT, no mainCRTStartup, no incidental IAT leaks.
Pythia result: 4 triggered decisions, ~$0.037 total cost, ~40 seconds end-to-end. Beacon URL https://github.com/AkashaCorporation surfaced via read_memory after the in-binary URL decoder ran. Canonical fixture: test/fixtures/azoth-beacon-demo.json.
- E10 — Runtime
.textintegrity baseline. Hashes 128 bytes of a neighbouring function at entry, re-hashes before the URL decode, exits silently on mismatch. Fails against any engine that injects INT3 software breakpoints. HexCore's Oracle Hook uses nativeUC_HOOK_CODE(code bytes untouched) — passes silently. - E11 — QueryPerformanceCounter tamper gate. Captures QPC at entry and before decode, exits on delta < 50. Against Azoth's synthetic QPC stub (constant
0x1), delta is zero → silent exit. Pythia handled this one on Sonnet by choosing the minimum-cost bypass (action=skiparound the whole compare block) rather than patching the QPC output. - E12 — Two-stage URL decoder with decoy.
sv_url_partialproduceshttp://127.0.0.1:31337/beacon/xyz.db(decoy) on the stack;sv_url_finalizeXORs a second mask to turn it into the real URL. An Oracle that pauses between the two calls reads the decoy. Pythia placed theread_memorytrigger after the finalize call and explicitly noted "confirmed stage-2 output, no decoy strings present" in her decision reasoning.
Pythia result: 5 triggered decisions, ~$0.081 total cost, ~46 seconds end-to-end. 3 Haiku pauses ($0.010 each) + 2 Sonnet escalations ($0.023 and $0.028) when the problem got subtle. Real beacon URL surfaced on first try, no manual intervention.
Both samples live at AkashaCorporationMalware and are reproducible end-to-end. See HexCore branch feature/oracle-hook-hackathon for the runner (scripts/pythia-azoth-run.mjs) and the integration bridge.
The cost figures above are from a specific reference run on Apr 22, 2026. Per-decision cost varies meaningfully between runs even on identical triggers, driven by:
- Prompt caching — Anthropic's prompt cache gives ~90% discount on reused system prompts. The first decision in a fresh session pays full freight; subsequent decisions hit the cache. A run started cold can cost up to ~3x more total than one that lands cache hits early.
- Tool-use iterations — Pythia may call
read_memoryordisassembleonce or several times before committing to a decision, depending on what she finds. Each round-trip is a separate API call. - Output length — model output is non-deterministic. The same trigger can produce a 200-token reasoning or a 600-token one.
In practice cost has ranged from ~$0.003 to ~$0.15 for the v7 demo across the runs I've measured. The mechanism is robust; the absolute number is not.
┌──────────────────────────────────────────────────────────────┐
│ Pythia Agent │
│ │
│ ┌──────────────┐ system prompt + request + tools │
│ │ Claude │ ──────────────────────────────────┐ │
│ │ (Haiku / │ │ │
│ │ Sonnet / │ ◄── DecisionResponse / ToolCall ──┘ │
│ │ Opus) │ │
│ └──────────────┘ │
│ ▲ │
│ │ DecisionRequest ┌─────────────────┐ │
│ │ ─────────────────────► │ Oracle Client │ │
│ │ ◄──────────────────── │ (this repo) │ │
│ │ DecisionResponse └─────────────────┘ │
│ ▲ │
└──────────────────────────────────────────┼───────────────────┘
│
SharedArrayBuffer /
stdio NDJSON
│
┌──────────────────────────────────────────┼───────────────────┐
│ HexCore ▼ │
│ │
│ ┌────────────────────┐ pause ┌──────────────────────┐ │
│ │ hexcore-unicorn │ ─────────► │ Oracle Hook │ │
│ │ (emulator) │ │ (new on hackathon │ │
│ │ │ ◄───────── │ branch) │ │
│ └────────────────────┘ resume └──────────────────────┘ │
│ │
│ + helix (decompiler), HQL (pattern matching), │
│ pathfinder (type recovery), elixir (Project Azoth) │
└──────────────────────────────────────────────────────────────┘
See ARCHITECTURE.md for the full protocol spec, tool set reference, and model routing rationale.
End-to-end pipeline is live and validated against both red-team samples above. Hackathon week progress:
| Day | Target | Status |
|---|---|---|
| Tue Apr 21 | Protocol spec, agent scaffold, HexCore branch plan | ✅ done |
| Wed Apr 22 | HexCore feature/oracle-hook-hackathon — trigger registry + pause/resume plumbing |
✅ done |
| Thu Apr 23 | stdio NDJSON transport, end-to-end handshake | ✅ done |
| Fri Apr 24 | Claude Agent SDK integration, live decide() loop, first real bypass demo | ✅ done (v6.1 unlocked) |
| Sat Apr 25 | Native breakpoint API in Elixir engine, Azoth pivot, stress test v7.0 | ✅ done (v7 unlocked on first try) |
| Sun Apr 26 | README, demo recording, submission | ✅ done |
The Oracle Hook mechanism is sample-agnostic, but trigger placement currently requires hand-authored job files. The demo against v6.1 and v7.0 covers the most common anti-analysis families (hash-resolved imports, PEB walks, timing checks, stack-decoded strings, multi-stage decoders). Samples that fit those families would map onto the same trigger patterns; samples with genuinely novel evasion mechanics need either a new pattern in Pythia's library or additional tooling on the HexCore side (loader coverage, devirtualization, syscall stubs).
Tested ground:
- x86_64 PE (Windows). 32-bit not tested.
- Azoth (clean-room) emulator path. Unicorn-with-real-DLLs path works in principle but the trigger reasons assume the Azoth synthetic LDR.
- Samples up to ~3M instructions of emulation budget.
Roadmap (post-hackathon):
- Auto-locate triggers from disassembly (the Claude Desktop skill is a first sketch of this — orchestrating Claude to identify call sites).
- ELF / Linux loader coverage (Project Azoth has the engine; the Pythia bridge needs trivial extension).
- Pattern library packaged as reusable trigger templates.
- Pythia as a published npm package (@hexcore/pythia) — users only need to run
npm installon HexCore, settings only ask for an API key, zero manual cloning.
npm install
npm run build
npx pythia rehearse --scenario timing-checkRehearsal runs a deterministic stub decision against a fixture. Used to validate the HexCore oracle plumbing before the live agent is wired. See ARCHITECTURE.md for how to switch into live mode once the API key is set.
echo "ANTHROPIC_API_KEY=sk-ant-..." > .env
npm install
npm run build
node ../vscode-main/scripts/pythia-azoth-run.mjs \
--job test/fixtures/azoth-beacon-demo.json
Drives the full Apr 22 demo against the v6.1 sample. Expect ~40s, ~$0.04, and a beacon URL surfaced from emulator memory.
---MIT. See LICENSE.
HexCore itself is dual-licensed MIT + Apache 2.0 and is consumed here as an external dependency.
- @YasminePayload — original Oracle Hook proposal (HexCore #17)
- @LXrdKnowkill — HexCore, architecture, Pythia implementation
- Anthropic Claude hackathon — for the deadline that made this happen
