Spbd1 · Spbd1 · May 18, 2026
diff --git a/.env.example b/.env.example
@@ -1,7 +1,24 @@
+# Backend server
 ARE_BACKEND_HOST=127.0.0.1
 ARE_BACKEND_PORT=8000
+
+# Frontend dev server
 ARE_FRONTEND_HOST=127.0.0.1
 ARE_FRONTEND_PORT=5173
+VITE_API_BASE=http://localhost:8000/api
+
+# Model provider selection
+# deterministic is the default local baseline and requires no API key.
 ARE_LLM_PROVIDER=deterministic
+ARE_MODEL_PROVIDER_ID=deterministic_baseline
+
+# Optional external providers. Keep real secrets in .env or your shell, not in Git.
 ARE_OPENAI_API_KEY=
 ARE_ANTHROPIC_API_KEY=
+ARE_OPENAI_COMPATIBLE_BASE_URL=
+ARE_OPENAI_COMPATIBLE_MODEL=
+
+# Local data paths
+ARE_DATA_DIR=data
+ARE_TAXONOMY_IMPORT_DIR=data/taxonomy/imports
+ARE_REPORT_DIR=data/reports
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,4 +1,19 @@
 # Changelog
 
-## 0.1.0
-- Initial local MVP scaffold for taxonomy-grounded argument risk analysis.
+All notable changes to this project will be documented here.
+
+## 0.1.0 - 2026-05-18
+
+### Added
+
+- Local-first FastAPI and React dashboard MVP.
+- One-command development startup script.
+- Docker Compose setup with backend, frontend, and a named data volume.
+- File-backed taxonomy packs, settings, reviews, reports, examples, and benchmarks.
+- Excel taxonomy import/export workflow.
+- Mini evaluation set with positives, negatives, and hard negatives.
+- Practical project documentation for setup, architecture, taxonomy design, annotation, evaluation, dashboard use, limitations, and roadmap.
+
+### Notes
+
+- Outputs are for human review only and must not be used for automated moderation, truth determination, or intent judgment.
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,26 @@
+FROM python:3.12-slim
+
+WORKDIR /app
+
+ENV PYTHONUNBUFFERED=1 \
+    PYTHONPATH=/app:/app/engine \
+    ARE_BACKEND_HOST=0.0.0.0 \
+    ARE_BACKEND_PORT=8000
+
+COPY pyproject.toml build_backend.py README.md ./
+COPY backend ./backend
+COPY engine ./engine
+COPY scripts ./scripts
+COPY data ./data
+COPY fastapi ./fastapi
+COPY pydantic ./pydantic
+COPY pydantic_settings ./pydantic_settings
+COPY openpyxl ./openpyxl
+COPY uvicorn ./uvicorn
+COPY yaml.py ./yaml.py
+
+RUN python -m pip install --upgrade pip && python -m pip install -e .[dev]
+
+EXPOSE 8000
+
+CMD ["python", "scripts/run_backend.py"]
diff --git a/README.md b/README.md
@@ -1,65 +1,248 @@
 # Argument-Risk-Engine
 
-Argument-Risk-Engine is a practical, local, Chrome-first web application for taxonomy-grounded argument risk analysis. It is designed for human review: it does **not** automate moral judgement or decide truth. It identifies argument-level risk patterns and explains them with evidence grounded in the submitted text and active taxonomy.
-
-## Core principles
-
-- **Taxonomy-first:** every risk label comes from an explicit taxonomy entry.
-- **Evidence-grounded:** reports quote or locate supporting text spans.
-- **Conservative:** uncertain findings are marked as low confidence or omitted.
-- **Local-first:** the MVP runs without authentication or a database.
-- **Configurable models:** deterministic local analysis is the default; paid LLM providers can be configured through `data/config/model_profiles.yaml`.
-- **Workbook friendly:** taxonomy packs can be imported from and exported to Excel workbooks. The real taxonomy workbook is a user-managed external file and is intentionally not committed to Git.
+Argument-Risk-Engine is a local-first, taxonomy-grounded web application for reviewing argument-level risk patterns in text. It combines a FastAPI backend, a React dashboard, file-based taxonomy packs, evidence-span extraction, reports, and a small evaluation harness so contributors can install it quickly, inspect outputs, and improve the taxonomy without needing a database.
+
+**The goal is not to automate moral judgement or determine truth. The system identifies argument-level risk patterns and provides evidence-grounded explanations for human review.**
+
+## What the system does
+
+- Splits submitted text into reviewable claims.
+- Retrieves active taxonomy entries that match each claim.
+- Produces conservative risk findings only when textual evidence is available.
+- Shows evidence spans, confidence, severity, explanations, and false-positive warnings.
+- Lets users import/export taxonomy Excel workbooks from Chrome or the CLI.
+- Lets users configure deterministic or external model-provider profiles from Chrome.
+- Generates downloadable Markdown, HTML, and JSON reports.
+- Runs a small benchmark set with positives, negatives, and hard negatives for regression checks.
+
+## What the system does not do
+
+- It does **not** automate moderation, enforcement, ranking, or eligibility decisions.
+- It does **not** determine whether a statement is factually true.
+- It does **not** infer author intent or diagnose a person’s beliefs, bias, or character.
+- It does **not** replace trained human review in high-stakes workflows.
+- It does **not** claim that the current taxonomy is complete or scientifically validated.
+
+## Architecture
+
+```text
+Chrome dashboard (React/Vite)
+  |-- Analyze text / save report
+  |-- Taxonomy workbench import/export
+  |-- Model settings
+  |-- Evaluation and review pages
+          |
+          v
+FastAPI backend (backend/app)
+  |-- /api/analyze
+  |-- /api/taxonomy-workbench/*
+  |-- /api/settings/*
+  |-- /api/reports/*
+          |
+          v
+Argument risk engine (engine/argument_risk_engine)
+  |-- claim extraction
+  |-- lexical retrieval over active taxonomy packs
+  |-- deterministic baseline classifier
+  |-- scoring, calibration, explanation, reports
+          |
+          v
+Local files only for MVP (data/)
+  |-- taxonomy packs and workbook imports/exports
+  |-- model profile YAML
+  |-- demo inputs and benchmark JSONL
+  |-- review queue and generated reports
+```
 
 ## One-command setup
 
-From the repository root, run:
+From the repository root:
 
 ```bash
 python scripts/dev.py --install --run --open
 ```
 
-The command creates or reuses `.venv`, installs Python dependencies, installs frontend dependencies, seeds demo data, starts the FastAPI backend at <http://localhost:8000>, starts the Vite dashboard at <http://localhost:5173>, and opens the dashboard in your default browser.
+This command will:
+
+1. Create or reuse `.venv`.
+2. Install backend dependencies with `pip install -e .[dev]`.
+3. Install frontend dependencies with `npm install` in `frontend/`.
+4. Seed demo data and benchmark files.
+5. Import the first `.xlsx` workbook found in `data/taxonomy/imports/` if one is available.
+6. Start the backend at <http://localhost:8000>.
+7. Start the frontend at <http://localhost:5173>.
+8. Open the dashboard in your default browser.
+
+Stop both servers with `Ctrl+C`.
+
+## Manual backend setup
+
+```bash
+python -m venv .venv
+. .venv/bin/activate          # Windows: .venv\Scripts\activate
+python -m pip install --upgrade pip
+python -m pip install -e .[dev]
+python scripts/seed_demo_data.py
+python scripts/run_backend.py
+```
+
+Backend health check:
+
+```bash
+curl http://localhost:8000/health
+```
+
+## Manual frontend setup
 
-## Manual setup
+In a second terminal:
 
 ```bash
-make install
-make test
-make run-backend
-make run-frontend
+cd frontend
+npm install
+npm run dev -- --host 127.0.0.1
 ```
 
-Useful commands:
+Open <http://localhost:5173> in Chrome. The Vite app calls the backend at `http://localhost:8000` by default.
+
+## Docker setup
 
 ```bash
-make dev              # install, seed, run, and open the dashboard
-make evaluate         # run the bundled mini evaluation set
-make import-taxonomy  # import an Excel taxonomy workbook
-make export-taxonomy  # export the active taxonomy to Excel
+docker compose up --build
 ```
 
-## Taxonomy workbook imports
+The compose file starts:
+
+- `backend`: FastAPI service on <http://localhost:8000>.
+- `frontend`: Vite dashboard on <http://localhost:5173>.
+- `are-data`: a named volume mounted at `/app/data` for MVP file storage.
+
+No database is required for the MVP.
+
+## Dashboard guide
+
+1. Open <http://localhost:5173> in Chrome.
+2. Use **Analyze** to paste text, run analysis, inspect claim cards, and save a report.
+3. Use **Reports** to preview and download saved reports.
+4. Use **Taxonomy Workbench** to validate packs, import an `.xlsx` workbook, export an `.xlsx` workbook, inspect coverage, and activate/deactivate entries.
+5. Use **Model Settings** to select the deterministic baseline or configure an OpenAI-compatible provider profile.
+6. Use **Evaluation** to run the bundled mini benchmark and inspect error categories.
+7. Use **Review** to inspect persisted review items and feedback examples.
+
+See `docs/dashboard_user_guide.md` for a screen-by-screen walkthrough.
+
+## Taxonomy import/export guide
 
-The real taxonomy workbook should be imported later from the dashboard or CLI and is intentionally not committed to Git. Place a local copy under `data/taxonomy/imports/` or choose it from Chrome in the Taxonomy Workbench. For CLI imports, run:
+### From Chrome
+
+1. Open **Taxonomy Workbench**.
+2. Choose a user-managed `.xlsx` workbook.
+3. Click **Import Excel**.
+4. Review import errors/warnings.
+5. Click **Validate taxonomy**.
+6. Click **Export Excel** to download the current active taxonomy workbook.
+
+### From the CLI
 
 ```bash
 python scripts/import_taxonomy_excel.py --input data/taxonomy/imports/argument_risk_taxonomy_living_workbook_v2_taxonomy_first.xlsx
+python scripts/export_taxonomy_excel.py data/taxonomy/exports/taxonomy.xlsx
+```
+
+The real taxonomy workbook is intentionally not committed. Generated import/export artifacts should remain local.
+
+## Model provider configuration
+
+The deterministic local baseline is the default and requires no API key. Provider metadata is stored in `data/config/model_profiles.yaml`; the active provider is stored in `data/config/app_settings.yaml`. Secrets should be supplied through environment variables or a local `.env` file copied from `.env.example`.
+
+```bash
+cp .env.example .env
+# edit ARE_LLM_PROVIDER, ARE_OPENAI_API_KEY, or custom provider values as needed
+```
+
+External providers are optional. When using them, keep output conservative and evidence-grounded; do not treat model output as a truth oracle.
+
+## API examples
+
+Analyze text:
+
+```bash
+curl -s http://localhost:8000/api/analyze \
+  -H 'Content-Type: application/json' \
+  -d '{"text":"Everyone on the project always ignores the checklist, even though the last review found exceptions."}'
+```
+
+List taxonomy entries:
+
+```bash
+curl -s http://localhost:8000/api/taxonomy
+```
+
+Export taxonomy workbook:
+
+```bash
+curl -L http://localhost:8000/api/taxonomy-workbench/export-excel -o taxonomy.xlsx
+```
+
+Generate a report from an analysis payload through the dashboard or:
+
+```bash
+curl -s http://localhost:8000/api/reports
+```
+
+## Example JSON output
+
+```json
+{
+  "analysis_id": "analysis_...",
+  "summary": {
+    "risk_count": 1,
+    "highest_severity": "medium",
+    "requires_human_review": true
+  },
+  "claims": [
+    {
+      "claim_id": "claim_1",
+      "text": "Everyone on the project always ignores the checklist",
+      "risks": [
+        {
+          "risk_id": "overgeneralization",
+          "label": "Overgeneralization",
+          "severity": "medium",
+          "confidence": 0.72,
+          "evidence_span": "Everyone",
+          "explanation": "The claim uses broad quantifier language and should be reviewed for overreach."
+        }
+      ]
+    }
+  ],
+  "warnings": ["Human review is required before using this output in consequential settings."]
+}
 ```
 
-Generated Excel exports and report files are local artifacts and are ignored by Git. Empty import/export/report directories are kept with `.gitkeep` files.
+Exact field order and confidence values may differ as the taxonomy and scoring rules evolve.
+
+## Evaluation notes
+
+Run:
+
+```bash
+make evaluate
+```
+
+The mini benchmark in `data/benchmarks/mini_eval_set.jsonl` is a practical regression set, not a scientific validation set. It includes positives, negatives, and hard negatives to monitor over-classification. Treat metrics as engineering signals and review false positives/false negatives manually.
+
+## Limitations
 
-## API overview
+The system may produce false positives, miss subtle risks, and should not be used for automated moderation. It does not judge intent, determine factual truth, or diagnose bias in a person. Human review is required for high-stakes use. A large taxonomy does not mean a complete taxonomy, and it does not mean aggressive classification. See `docs/limitations.md`.
 
-- `POST /api/analysis/analyze` analyzes text and returns claims, risks, evidence, and a conservative summary.
-- `GET /api/taxonomy` lists active taxonomy entries.
-- `POST /api/taxonomy-workbench/import` imports an Excel workbook.
-- `GET /api/taxonomy-workbench/export` exports the taxonomy workbook.
-- `GET /api/settings` and `PUT /api/settings` manage local model settings.
-- `GET /api/reports/{analysis_id}.md` returns a markdown report.
+## Roadmap
 
-## Development notes
+Near-term priorities are stronger taxonomy quality checks, richer benchmark coverage, better report templates, provider-specific testing, and packaging improvements. See `docs/roadmap.md`.
 
-The MVP intentionally uses plain files under `data/` instead of a database. Review feedback is appended to `data/review/review_store.jsonl`; taxonomy packs live under `data/taxonomy/packs`; reports are written to `data/reports`.
+## Contributing
 
-See `docs/technical_architecture.md`, `docs/taxonomy_design.md`, and `docs/dashboard_user_guide.md` for details.
+- Keep claims conservative and evidence-grounded.
+- Add tests for engine, API, and import/export changes.
+- Update docs when changing routes, setup, taxonomy schema, or dashboard behavior.
+- Do not commit private taxonomy workbooks, API keys, generated reports, or local review artifacts.
+- Run `make test` and `make evaluate` before opening a pull request.