feat(DRIVE): hybrid RAG (keyword + e5-large embedding) for NL edit grounding#153
Merged
Conversation
…ounding Ground the local test-drive model on curated fix-recipes so natural-language edits land the correct change. Two-stage retrieval, precision-first: - STAGE 1 keyword substring match (exact, 0.02s, no model load) - STAGE 2 in-process fastembed (multilingual-e5-large, query:/passage: prefixes) as a paraphrase fallback only when STAGE 1 is empty; conservative 0.80 thresh. Self-contained: no embedding server/LAN/VPN — fastembed runs in-process via ~/.drive-rag-venv. drive.hexa calls rag_retrieve.py, falling back to its inline keyword scan if the venv is absent. fix_recipes.txt is the runtime-editable KB. Honest finding: embedding alone underperforms keyword on this KB (MiniLM had an "area" attractor that out-ranked correct matches; e5-large fixes it but still misses ~1/4 of in-vocabulary queries) — hence keyword-primary, embedding-fallback. Validated via _drivesim 100-scenario harness (70 code + 30 NL-git): 100/100 pass (96 first-pass + 4 model-flake recoveries on retry; all 30 git scenarios first-pass). Root-caused the harness input race as a CR-vs-LF issue (drive's PTY input() needs \n) — drive itself was correct. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
DRIVE test-drive REPL now grounds the local model on curated fix-recipes via hybrid retrieval — keyword-precision first, embedding-recall fallback.
Self-contained: no embedding server / LAN / VPN — fastembed runs in-process via
~/.drive-rag-venv.drive.hexacallsrag_retrieve.py, degrading to its inline keyword scan if the venv is absent.fix_recipes.txtis the runtime-editable KB.Honest finding
Embedding alone underperforms keyword on this KB — MiniLM had an "area" attractor out-ranking correct matches; e5-large fixes it but still misses ~1/4 of in-vocabulary queries. Hence keyword-primary, embedding-fallback (not embedding-only).
Validation —
_drivesim100-scenario harness (70 code + 30 NL-git)input()needs\n) — drive itself was correctFiles
DRIVE/{drive.hexa, DRIVE.md, rag_retrieve.py, fix_recipes.txt}_drivesim/{gen.py, run.sh, run100.sh, retry.sh, drive_one.exp, manifest.json}scenarios/,build/,fix_recipes.emb.json🤖 Generated with Claude Code