You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+105Lines changed: 105 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -102,6 +102,92 @@ Response example:
102
102
}
103
103
```
104
104
105
+
## MLOps Additions (Open Source + Free)
106
+
107
+
- Reproducible training: `src/train.py` now accepts `--seed` and fixes RNGs. After training it writes `metrics.json` and `run_info.json` into the output model folder with args, data hash, and git commit (if available) for traceability.
108
+
- Health/CORS/metrics: FastAPI exposes `/healthz` and `/readyz`. CORS is enabled for React by default. Optional Prometheus metrics (set `ENABLE_METRICS=1`) if `prometheus-fastapi-instrumentator` is installed.
109
+
- Tests: Added unit and API tests under `tests/`. Dev deps in `dev-requirements.txt`. CI via GitHub Actions runs tests on pushes/PRs in public repos for free.
110
+
111
+
### Configure CORS for React
112
+
113
+
- By default all origins are allowed. To restrict:
114
+
115
+
```bash
116
+
set ALLOW_ORIGINS=http://localhost:5173,http://localhost:3000 # Windows PowerShell: $env:ALLOW_ORIGINS="http://..."
117
+
uvicorn app.main:app --reload --port 8000
118
+
```
119
+
120
+
### Enable Prometheus metrics (optional)
121
+
122
+
```bash
123
+
export ENABLE_METRICS=1 # Windows PowerShell: $env:ENABLE_METRICS=1
-`reports/manual_eval/manual_eval_results.{json,csv}`: per-scenario expectations vs. predictions, true/false positives, misses.
169
+
-`reports/manual_eval/manual_eval_summary.md`: bullet-point narrative calling out gaps for non-technical leads.
170
+
-`reports/manual_eval/figures/*.png`: outcome distribution, per-case recall, and top missed components for slide-ready visuals.
171
+
172
+
Use `--top_k_fallback` (default 0) to add the best-scoring labels even when the sigmoid score is below the threshold—handy for exploratory edge-case analysis. Edit `reports/manual_eval_cases_100.jsonl` directly or regenerate it with:
173
+
174
+
```bash
175
+
python reports/build_manual_cases.py \
176
+
--limit 100 \
177
+
--output_path reports/manual_eval_cases_100.jsonl
178
+
```
179
+
180
+
The generator spans authentication, lending, collections, KYC, payments, reporting, disputes, core integration, omni-channel comms, and regulatory scenarios so each component in the taxonomy appears multiple times.
181
+
182
+
### Lightweight Run Tracking Artifacts
183
+
184
+
- After training, check your output dir (e.g., `models/distilbert_component_classifier/`) for:
185
+
-`metrics.json`: evaluation metrics from the Trainer
186
+
-`run_info.json`: training args, data SHA256, git commit
187
+
-`label2id.json`: label mapping used at inference
188
+
189
+
These files are simple, portable, and versionable in git or any storage.
190
+
105
191
## Optional: Docker
106
192
107
193
```dockerfile
@@ -125,3 +211,22 @@ docker run -p 8000:8000 component-identifier
125
211
126
212
- Training defaults target CPU-friendly settings (batch size 4, max length 256, 3–5 epochs). Adjust `--num_epochs`, `--learning_rate`, and other CLI flags as needed.
127
213
- The provided synthetic dataset is only for demonstration. Replace it with real, labeled production data for meaningful predictions.
0 commit comments