A fully agentic system to read architectural home plans (PDFs or images), extract structured data, and validate them against predefined building code rules.
┌─────────────────────────────────────────────────────────────────┐
│ INPUT: Plan PDF/Image + Rules Text File │
└───────────────┬─────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ AGENT 1: GPT-4o Vision Extractor │
│ • Renders each PDF sheet at 300 DPI │
│ • Sends image to GPT-4o (high detail) │
│ • Extracts: rooms, doors, windows, │
│ stairs, setbacks, annotations │
│ • Output: structured JSON per element │
│ • Flags low-confidence extractions │
└───────────────┬──────────────────────────┘
│ extracted_plan.json
▼
┌──────────────────────────────────────────┐
│ AGENT 2: Vectorization Engine │
│ • Converts JSON elements → text chunks │
│ • Parses rules text → individual rules │
│ • Embeds both via text-embedding-3-large│
│ • Builds two FAISS indexes │
│ (L2-normalized for cosine similarity) │
│ • Saves indexes + chunks to disk │
└───────────────┬──────────────────────────┘
│ FAISS indexes + chunk metadata
▼
┌──────────────────────────────────────────┐
│ AGENT 3: Compliance Reasoning Engine │
│ For each rule: │
│ 1. Embed rule → retrieve top-K plan │
│ chunks via cosine similarity │
│ 2. Pass rule + chunks to GPT-4o for │
│ logical compliance determination │
│ 3. If INFO_NOT_FOUND → feedback loop │
│ re-queries Agent 1 with targeted │
│ question about missing element │
│ Output: COMPLIES / NON_COMPLIANT / │
│ INFO_NOT_FOUND per rule │
└───────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ OUTPUT: compliance_summary.json + detailed_results.json │
└─────────────────────────────────────────────────────────────────┘
pip install -r requirements.txtSet your OpenAI API key:
export OPENAI_API_KEY=sk-...python run_pipeline.py --plan /path/to/homeplan.pdf --rules sample_rules.txtAgent 1 only (extract plan data):
python agent1_extractor.py /path/to/homeplan.pdf
# Output: extracted_plan.jsonAgent 2 only (vectorize, requires Agent 1 output):
python agent2_vectorizer.py extracted_plan.json sample_rules.txt
# Output: vector_store/Agent 3 only (compliance check, requires Agent 2 output):
python agent3_compliance.py vector_store/ /path/to/homeplan.pdf
# Output: compliance_results/python run_pipeline.py --plan plan.pdf --rules rules.txt --no-requery| File | Description |
|---|---|
output/extracted_plan.json |
Structured extraction from Agent 1 |
output/vector_store/ |
FAISS indexes and chunk pickles from Agent 2 |
output/compliance_results/compliance_summary.json |
High-level report: overall verdict, violations, gaps |
output/compliance_results/detailed_results.json |
Per-rule breakdown with reasoning and evidence |
| Verdict | Meaning |
|---|---|
COMPLIES |
Plan data meets the rule requirement |
NON_COMPLIANT |
Plan data clearly violates the rule |
INFO_NOT_FOUND |
Relevant data absent from plan (manual review needed) |
Overall plan verdicts:
COMPLIES— All rules passNON_COMPLIANT— At least one rule is violatedPARTIAL_REVIEW_NEEDED— No violations found but some rules couldn't be evaluated due to missing data
Rules can be in either format:
Numbered rules:
1. Minimum bedroom dimension shall be 7 feet in any direction.
2. Maximum riser height is 7.75 inches.
Free text blocks (separated by blank lines):
All egress windows shall have a minimum net clear opening of 5.7 square feet.
Stairway width shall not be less than 36 inches.
- GPT-4o high detail mode is used for all plan images — critical for small annotations and rotated dimension text
- FAISS cosine similarity retrieves relevant plan elements per rule (similarity threshold: 0.15)
- LLM reasoning layer in Agent 3 performs actual logic/arithmetic, not just similarity matching
- Confidence flags on extracted elements are passed through to the compliance reasoning
- Feedback loop between Agent 3 and Agent 1 handles INFO_NOT_FOUND cases automatically
- Multi-sheet PDFs are fully supported — each sheet processed independently then merged