A LightGBM-based IDS trained on UNSW-NB15, deployed on Raspberry Pi 4 in passive monitoring mode.
From the repo root (.venv is not tracked):
uv sync --extra devPoint your editor at .venv/bin/python. For notebooks, optionally register a kernel: .venv/bin/python -m ipykernel install --user --name=ml-ids --display-name="Python (ML-IDS)".
Train the full-feature LightGBM on UNSW-NB15 (expects CSVs under data/ per src/config.py). Saves the model, preprocessor, labels, metrics, and hyperparams.json under the output directory (default models/full/):
uv run python scripts/train.pyTrain with tuned hyperparameters (after hyperparameter tuning):
uv run python scripts/train.py --output-dir models/full_tuned --hyperparams models/full/best_params.jsonUse --feature-cols pi for the 30-feature Pi subset, or --feature-cols nfstream for the 13-feature nfstream-verified subset (Phase B).
Train the 13-feature nfstream model:
uv run python scripts/train.py --feature-cols nfstream --output-dir models/nfstreamTest the inference pipeline offline against a PCAP:
uv run python scripts/test_pcap.py --pcap capture.pcapRun the live detection daemon (requires nfstream):
uv run python scripts/deploy.py --iface wlan0 --model-dir models/nfstreamRun Optuna on the training split; writes study.db, best_params.json, and tuning_history.json under the output directory (default models/full/):
uv run python scripts/tune_hyperparams.py --n-trials 100 --output-dir models/fullThen train the final model:
uv run python scripts/train.py --output-dir models/full_tuned --hyperparams models/full/best_params.jsonMultiple processes can share the same --output-dir / study.db for parallel trials.
This project is split into two phases to validate feature availability and model performance across training and deployment environments.
Trains LightGBM on all 38 UNSW-NB15 features as a performance ceiling for comparison with Phase B. Training writes the model, preprocessor, label map, and test-set metrics under the output directory (default models/full/).
Outputs: Full-feature model, preprocessor, metrics.json, optional hyperparams.json / tuning artifacts if you run hyperparameter tuning.
Latest results (test set, from models/full/metrics.json):
| Metric | Value |
|---|---|
| Accuracy | 69.76% |
| Macro F1 | 0.5265 |
Produced with uv run python scripts/train.py (38-feature default).
Plan: docs/plans/phase-b-deployment.md
Trained LightGBM on the 13 features actually extractable from nfstream on the Pi
(NFSTREAM_FEATURE_COLS in src/data/loader.py). Deploys for real-time passive
flow classification on wlan0 with configurable confidence threshold (default 0.7),
JSON lines alert logging, and human-readable stdout.
Model results (test set, from models/nfstream/metrics.json):
| Metric | Value |
|---|---|
| Accuracy | 69.23% |
| Macro F1 | 0.4983 |
| Features | 13 (11 numeric + 2 categorical: proto, service) |
Deployment architecture:
wlan0 → NFStreamer → FeatureExtractor → preprocessor.transform()
→ model.predict_proba() → threshold gate → AlertLogger (jsonl + stdout)
Key files:
| File | Role |
|---|---|
src/inference/engine.py |
Load artifacts, predict with threshold gate |
src/inference/feature_extractor.py |
NFlow → 13-feature dict |
src/inference/logger.py |
JSON lines file + human stdout |
scripts/deploy.py |
Foreground daemon |
scripts/test_pcap.py |
Offline PCAP replay validation |
deploy/ml-ids.service |
systemd unit |
deploy/setup_pi.sh |
One-command Pi provisioning |
Plan: docs/plans/attack-simulator.md
Visual demo tool that generates synthetic network flows resembling real attacks, runs them through the trained ML-IDS model, and displays detections in a live React dashboard. Designed for non-technical audience demonstrations.
Architecture:
Browser ←WebSocket→ FastAPI Server → FlowGenerator → InferenceEngine
(port 8000) (synthetic) (same as live daemon)
One-time setup:
# 1. Install simulator Python deps
uv sync --extra simulator
# 2. Build prototype statistics from training data
uv run python scripts/build_simulator_prototypes.py
# 3. Install and build React frontend
cd web && npm install && npm run build && cd ..Development (two terminals):
# Terminal 1: FastAPI backend
uv run python -m src.simulator.server --model-dir models/nfstream
# Terminal 2: Vite dev server (proxies /ws and /api to :8000)
cd web && npm run dev
# Open http://localhost:5173Production (single command):
uv run python -m src.simulator.server \
--model-dir models/nfstream \
--static-dir web/dist \
--host 0.0.0.0 --port 8000
# Open http://<ip>:8000Dashboard features:
- ▶ Auto-Mix mode: scripted normal traffic punctuated by attacks
- Manual attack triggers: [Fuzzers] [DoS] [Exploits] [Generic] [Recon]
- Speed slider (0.5x–5x)
- Live flow feed with color-coded rows
- Alert panel with confidence bars
- Pie chart (normal vs attack) + bar chart (alerts by type)
- Togglable ML probability view per flow
Pi deployment:
Two artifacts must reach the Pi. The model (models/nfstream/lgbm_ids.pkl) is
already present if the live daemon is installed. The other two are built on Mac
and synced:
# On Mac: build both artifacts
uv run python scripts/build_simulator_prototypes.py
cd web && npm run build && cd ..
# Sync both to Pi
rsync -avz models/nfstream/simulator_prototypes.json pi@<pi-ip>:/home/pi/ml-ids/models/nfstream/
rsync -avz web/dist/ pi@<pi-ip>:/home/pi/ml-ids/web/dist/
# On Pi: verify artifacts are present
ls models/nfstream/simulator_prototypes.json
ls web/dist/index.html
# Run server (no Node.js needed)
uv run python -m src.simulator.server \
--model-dir models/nfstream \
--static-dir web/dist \
--host 0.0.0.0 --port 8000Alternatively, build prototypes on the Pi itself if the training data lives there:
uv run python scripts/build_simulator_prototypes.py.
What gets synced vs. what's already on the Pi:
| Artifact | How it reaches Pi |
|---|---|
models/nfstream/lgbm_ids.pkl |
Already present (installed with live daemon) |
models/nfstream/simulator_prototypes.json |
Manual: rsync from Mac, or build on Pi |
web/dist/ |
Manual: rsync from Mac (Node.js not needed on Pi) |
The simulator imports the exact same InferenceEngine as the live daemon, so
classification behavior shown is byte-for-byte identical.
Live capture mode (real traffic):
# Start simulator with live wlan0 capture alongside synthetic
uv run python -m src.simulator.server \
--model-dir models/nfstream \
--static-dir web/dist \
--live-iface wlan0 \
--live-log-file /var/log/ml-ids/alerts-live.jsonl \
--host 0.0.0.0 --port 8000The dashboard shows a Source toggle: Live / Synthetic / Both. Attack buttons inject synthetic attacks into the live feed for demos.
Note: For now, the simulator runs its own nfstream capture independently of
deploy.py. Both processes can coexist on wlan0. Future direction: have the
simulator consume flow events from deploy.py (via its alert log or a local
socket) rather than running a second capture.
- Unify simulator and daemon capture: Have the simulator consume flows from
deploy.pyvia IPC instead of running its own nfstream instance. Seedocs/plans/live-capture-mode.md. - Hyperparameter tuning on nfstream features: Run Optuna study on the 13-feature set to optimize for Pi deployment. See
docs/plans/nfstream-13-feature-training.md. - Deriving
statefrom TCP flags: Could reconstruct TCP connection state from nfstream's SYN/FIN/RST/ACK counters, recovering a third categorical feature. - iptables reactive blocking (v2): Auto-block high-confidence attack source IPs.
- Metrics export: Prometheus endpoint for flow rates, alert counts, prediction distributions.
| Model | Features | Accuracy | F1-Macro |
|---|---|---|---|
| Full (Phase A) | 38 | 69.76% | 0.5265 |
| nfstream (Phase B) | 13 | 69.23% | 0.4983 |
| Delta | -25 | -0.53pp | -0.0282 |
The 13-feature nfstream model retains 99.2% of full-feature accuracy and 94.6% of F1-macro. The 25 dropped features (connection-time windows, TCP state, TTL, loss, load, window size, TCP base, etc.) carry minimal marginal signal for classification.
- Python 3.11
- LightGBM (multi-class classifier)
- scikit-learn (preprocessing)
- pandas (data loading)
- nfstream (flow capture on Pi)
- joblib (model serialization)
- pytest (testing)