A compact, reproducible pipeline that generates transactional data, scores anomalies with ML (Isolation Forest), adds auditable business rules, and exports clean tables for Power BI. It’s designed to show how I approach audit & digital assurance with a pragmatic ML + IT mindset: engineered features, cost-aware thresholds, explainable cases, and a clean export layer decoupled from the dashboard.
- Hybrid detection: unsupervised ML (Isolation Forest) + explicit rules for transparent coverage.
- Cost-aware tuning: threshold sweep with precision/recall, workload %, FPR, and expected cost (C_FP vs C_FN).
- Explainability: top-N case sheets with simple reasons (
amount_z,hour,new_country,merchant_novelty) for auditors. - Reproducible exports: one click/command creates a timestamped
exports/run_*folder ready for Power BI. - Engineering quality: clear
src/package, centralized paths, environment override, CLI scripts, deterministic seeds.
├─ src/
│ ├─ paths.py # project root, exports dir, run dir (timestamped)
│ ├─ export_utils.py # CSV/JSON writers for all artifacts
│ ├─ data_generator.py # synthetic data with injected anomaly types
│ └─ anomaly_detector.py # feature eng, rules, Isolation Forest, metrics, exports
├─ exports/ # auto-created; contains run_* subfolders
├─ run_exports_demo.py # optional launcher/orchestrator
├─ dashboard_anomalies.pbix # Power BI dashboard
├─ environment.yml # full conda environment
├─ requirements.txt # minimal pip fallback
└─ docs/img/ # screenshots for README
conda env create -f environment.yml
conda activate anomalies# either
python src/data_generator.py
# or (if src is a package in your PYTHONPATH)
python -m src.data_generatorOutputs:
exports/data_YYYY-MM-DD_HH-MM-SS/transactions_synth.csv# Use the latest generated dataset automatically:
python src/anomaly_detector.py --seed 42 --default_threshold 0.85
# Or point explicitly to a dataset:
python src/anomaly_detector.py \
--data exports/data_*/transactions_synth.csv \
--seed 42 --default_threshold 0.85This creates:
exports/run_YYYY-MM-DD_HH-MM-SS/
├─ transactions_scored.csv # each transaction with ML score, flags, reasons
├─ thresholds.csv # sweep of thresholds with metrics (ML-only & Hybrid)
├─ metrics_by_type.csv # precision/recall/F1 per anomaly type
├─ confusion_matrix.csv # global confusion at the chosen default threshold
└─ run.json # run metadata (paths, sizes, parameters)Creates realistic synthetic payments with channels (web, app, atm, branch), categories, geos, timestamps, and injects labelled anomalies:
- HIGH_AMOUNT — extreme amounts
- ODD_HOUR — suspicious night activity
- DUPLICATE — fast duplicates on
(customer, amount, merchant_category) - BURST — bursts in a short window
- IMPOSSIBLE_TRAVEL / NEW_COUNTRY — improbable geolocation patterns
Saves transactions_synth.csv in a timestamped exports/data_* folder.
Adds interpretable features for both ML and rules:
- Temporal:
hour,dayofweek,time_since_last(per customer) - Amount scaling & context:
amount_log,amount_z(per-customer z-score) - Behavioural:
merchant_novelty(rare merchant for the customer)
Auditable heuristics:
rule_night_high_amount: amount > 500 during 00:00–04:00rule_new_country: first time a customer appears in a countryrule_burst: rolling 10-minute burst count per customerrule_travel: simplified travel flag (e.g., countryCNfrom generator)rule_duplicate: same(customer, amount, category)within 5 minutes
Aggregate to is_rule_flag.
- Isolation Forest (unsupervised),
contamination≈0.025 - Score normalization to 0–1 (
score) - ML flag via threshold:
is_ml_flag
- Final decision:
is_flag = is_ml_flag OR is_rule_flag - Threshold sweep (
thresholds.csv): precision, recall, FPR, workload %, and expected cost withC_FP=1,C_FN=10 - Global confusion at a chosen
--default_threshold(confusion_matrix.csv) metrics_by_type.csvto see what types the system misses or over-flagscases.csv: top-N alerts withreasons_top3andtriggered_rulesfor reviewers
Two pages fed only by the exported CSVs:
-
Threshold / KPIs
- Slicer:
threshold(comes fromthresholds.csv) - KPIs: ML Precision/Recall/Workload %, Heuristic Precision/Recall/Workload %
- Bar chart: Recall by Anomaly Type (ML vs Heuristic)
- Slicer:
-
Anomaly Volume & Trends
- Cards: Total Anomalies, Anomaly Rate %
- Donut: detection breakdown (ML-only, Rules-only, Overlap)
- Bar: anomalies by Channel
- Line: Anomaly Count over Time (daily)
The dashboard is intentionally simple and auditor-friendly: it separates tuning (page 1) from operational monitoring (page 2).
High threshold (low workload, higher precision)
Threshold high (≈ 0.77)
- ML: Precision 27.50%, Recall 7.19%, Workload 0.50%
- Heuristic: Precision 1.99%, Recall 98.04%, Workload 94.26%
At this level, the system is very selective: only the most obvious anomalies are flagged.
This means:
- Precision increases → most ML alerts are relevant.
- Recall drops → many anomalies go undetected.
- Workload is minimal → only a tiny fraction of transactions are sent for review.
This setup is useful when audit teams want to focus on quality over quantity, with very limited review capacity.
Low threshold (higher recall, higher workload)
Threshold low (≈ 0.34)
- ML: Precision 4.69%, Recall 49.02%, Workload 19.99%
- Heuristic: Precision 1.97%, Recall 98.04%, Workload 95.23%
Here the model is much more permissive:
- Recall increases significantly (almost half of true anomalies are caught by ML).
- Precision drops sharply → many false positives enter the pipeline.
- Workload rises → ~20% of transactions must be reviewed by humans.
This setup favors coverage and risk detection over efficiency, which may overwhelm auditors.
Operational view
- Total anomalies: 153 (Rate 1.91%)
- Detection breakdown: mostly Rules-only, some ML-only, tiny Overlap
- Channel:
webgenerates the highest anomaly volume; followed byatm,app, thenbranch - Trend: one clear spike on 15 February with 27 anomalies detected in a single day, possibly linked to a batch effect or unusual event.
Outside of this spike, anomalies remain stable and low across the period.
Takeaway
- Rules provide broad coverage (~98% recall) but at the cost of very high workload.
- ML enables a trade-off: by tuning the threshold, you can decide between catching more anomalies (high recall, high workload) or focusing on fewer, higher-quality cases (high precision, low workload).
- The hybrid approach combines transparency (rules) with adaptability (ML), making it suitable for audit environments where efficiency and explainability are both critical.
- Modeling: unsupervised Isolation Forest with behavioural features; calibrated via quantile-based thresholding and cost-aware selection.
- Explainability for audit: per-type metrics + human-readable reasons on cases.
- Engineering for teams: deterministic seeds, timestamped runs,
run.jsonmetadata, clean writers, env-overridable project root, CLI usage. - Separation of concerns: Python does the heavy lifting once; BI tools consume frozen exports — easy to refresh, easy to review.
You have two options:
-
Open the PBIX file directly
- Download the repository and open the included
dashboard_anomalies.pbixin Power BI Desktop. - The PBIX is already linked to the exported CSVs (
transactions_scored.csv,thresholds.csv,metrics_by_type.csv,confusion_matrix.csv). - As long as the folder structure is preserved, everything works out of the box. If needed, you can easily re-point the sources to the CSVs provided.
- Download the repository and open the included
-
Rebuild from scratch (optional)
- In Power BI Desktop: Get Data → Text/CSV and connect manually to the 4 files above.
- Create a simple star-like model (no complex relationships needed).
- Use
thresholds[threshold]as a slicer - Bind KPIs to the selected threshold row (ML vs Heuristic) and recreate visuals as shown.
The repo also includes screenshots for quick viewing if you don’t want to open Power BI.
This project demonstrates how hybrid anomaly detection (ML + rules) can be made both auditable and operational. By combining explainability, cost-aware metrics, and a reproducible Python-to-Power BI pipeline, it provides a practical example of how AI/IT solutions can support digital assurance and audit efficiency.