feat(DRIVE): hybrid RAG (keyword + e5-large embedding) for NL edit grounding by dancinlife · Pull Request #153 · dancinlab/hexa-codex

dancinlife · 2026-06-04T20:23:02Z

What

DRIVE test-drive REPL now grounds the local model on curated fix-recipes via hybrid retrieval — keyword-precision first, embedding-recall fallback.

STAGE 1 keyword substring  →  exact, 0.02s, no model load (queries quote keys)
STAGE 2 in-process fastembed (multilingual-e5-large, query:/passage: prefixes)
        →  paraphrase fallback ONLY when STAGE 1 empty · conservative 0.80 thresh

Self-contained: no embedding server / LAN / VPN — fastembed runs in-process via ~/.drive-rag-venv. drive.hexa calls rag_retrieve.py, degrading to its inline keyword scan if the venv is absent. fix_recipes.txt is the runtime-editable KB.

Honest finding

Embedding alone underperforms keyword on this KB — MiniLM had an "area" attractor out-ranking correct matches; e5-large fixes it but still misses ~1/4 of in-vocabulary queries. Hence keyword-primary, embedding-fallback (not embedding-only).

Validation — `_drivesim` 100-scenario harness (70 code + 30 NL-git)

	result
final	100/100 (96 first-pass + 4 model-flake recoveries on retry)
git NL (commit/push/branch)	30/30 first-pass
harness fix	input race root-caused as CR-vs-LF (drive PTY `input()` needs `\n`) — drive itself was correct

Files

product: DRIVE/{drive.hexa, DRIVE.md, rag_retrieve.py, fix_recipes.txt}
harness: _drivesim/{gen.py, run.sh, run100.sh, retry.sh, drive_one.exp, manifest.json}
excluded (generated/cache): scenarios/, build/, fix_recipes.emb.json

🤖 Generated with Claude Code

…ounding Ground the local test-drive model on curated fix-recipes so natural-language edits land the correct change. Two-stage retrieval, precision-first: - STAGE 1 keyword substring match (exact, 0.02s, no model load) - STAGE 2 in-process fastembed (multilingual-e5-large, query:/passage: prefixes) as a paraphrase fallback only when STAGE 1 is empty; conservative 0.80 thresh. Self-contained: no embedding server/LAN/VPN — fastembed runs in-process via ~/.drive-rag-venv. drive.hexa calls rag_retrieve.py, falling back to its inline keyword scan if the venv is absent. fix_recipes.txt is the runtime-editable KB. Honest finding: embedding alone underperforms keyword on this KB (MiniLM had an "area" attractor that out-ranked correct matches; e5-large fixes it but still misses ~1/4 of in-vocabulary queries) — hence keyword-primary, embedding-fallback. Validated via _drivesim 100-scenario harness (70 code + 30 NL-git): 100/100 pass (96 first-pass + 4 model-flake recoveries on retry; all 30 git scenarios first-pass). Root-caused the harness input race as a CR-vs-LF issue (drive's PTY input() needs \n) — drive itself was correct. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

dancinlife merged commit 60adc9b into main Jun 4, 2026

dancinlife deleted the feat/drive-rag-hybrid branch June 4, 2026 20:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(DRIVE): hybrid RAG (keyword + e5-large embedding) for NL edit grounding#153

feat(DRIVE): hybrid RAG (keyword + e5-large embedding) for NL edit grounding#153
dancinlife merged 1 commit into
mainfrom
feat/drive-rag-hybrid

dancinlife commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dancinlife commented Jun 4, 2026

What

Honest finding

Validation — _drivesim 100-scenario harness (70 code + 30 NL-git)

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Validation — `_drivesim` 100-scenario harness (70 code + 30 NL-git)