Skip to content

7vik2005/TeleSentry-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“ก TeleSentry AI

Explainable Telecom Fraud Detection Platform using Machine Learning, Rule-Based Intelligence, SHAP, FastAPI, and Streamlit

Python Scikit-Learn Streamlit FastAPI SHAP License


๐Ÿ“– Overview

TeleSentry AI is an end-to-end Telecom Fraud Detection Platform designed to identify suspicious calling behavior using a combination of:

  • Rule-Based Fraud Intelligence
  • Isolation Forest Anomaly Detection
  • Random Forest Classification
  • SHAP Explainability
  • Interactive Streamlit Dashboard
  • FastAPI Prediction Service

The system simulates realistic telecom users and fraudsters, engineers behavioral telecom features, detects suspicious activities, explains predictions, and exposes results through a dashboard and API.


๐ŸŽฏ Problem Statement

Telecommunication fraud has become increasingly sophisticated.

Common fraud patterns include:

  • Digital Arrest Scams
  • Mass Calling Operations
  • Long Distance Fraud Rings
  • Social Engineering Networks
  • Automated Calling Bots

Traditional rule-based systems fail to detect new fraud patterns, while pure machine learning systems often lack interpretability.

TeleSentry AI combines both approaches to deliver:

  • High detection accuracy
  • Transparent predictions
  • Real-time fraud assessment

๐Ÿ— Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚          Synthetic Data Generator       โ”‚
โ”‚        (Telecom User Simulation)        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚
                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           Raw Synthetic Dataset         โ”‚
โ”‚      generated_dataset.csv (13k+)       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚
                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         Data Preprocessing Layer        โ”‚
โ”‚                                         โ”‚
โ”‚ โ€ข Cleaning                              โ”‚
โ”‚ โ€ข Validation                            โ”‚
โ”‚ โ€ข Train/Test Split                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚
                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚        Feature Engineering Layer        โ”‚
โ”‚                                         โ”‚
โ”‚ โ€ข call_intensity                        โ”‚
โ”‚ โ€ข distance_per_call                     โ”‚
โ”‚ โ€ข contact_circle_ratio                  โ”‚
โ”‚ โ€ข delivery_pattern                      โ”‚
โ”‚ โ€ข high_freq_long_distance               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚
                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚          Rule Engine Layer              โ”‚
โ”‚                                         โ”‚
โ”‚ โ€ข Digital Arrest Detection              โ”‚
โ”‚ โ€ข Mass Calling Detection                โ”‚
โ”‚ โ€ข Long Distance Scam Detection          โ”‚
โ”‚ โ€ข Traveler Detection                    โ”‚
โ”‚ โ€ข Business User Detection               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚
                   โ–ผ
      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
      โ”‚      ML Layer           โ”‚
      โ”‚                         โ”‚
      โ”‚ Isolation Forest        โ”‚
      โ”‚ Random Forest           โ”‚
      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                  โ”‚
                  โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚          Evaluation Layer               โ”‚
โ”‚                                         โ”‚
โ”‚ Accuracy                                โ”‚
โ”‚ Precision                               โ”‚
โ”‚ Recall                                  โ”‚
โ”‚ F1 Score                                โ”‚
โ”‚ ROC-AUC                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚
                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         Explainability Layer            โ”‚
โ”‚                                         โ”‚
โ”‚ SHAP Summary                            โ”‚
โ”‚ SHAP Waterfall                          โ”‚
โ”‚ Feature Importance                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ–ผ                  โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Streamlit UI   โ”‚  โ”‚   FastAPI API    โ”‚
โ”‚                โ”‚  โ”‚                  โ”‚
โ”‚ Dashboard      โ”‚  โ”‚ /predict         โ”‚
โ”‚ Analytics      โ”‚  โ”‚ /health          โ”‚
โ”‚ Live Predict   โ”‚  โ”‚ Swagger Docs     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

System Flow

Synthetic Data Generation
          โ†“
Data Preprocessing
          โ†“
Feature Engineering
          โ†“
Rule Engine
          โ†“
Machine Learning Layer
          โ†“
Evaluation Layer
          โ†“
SHAP Explainability
          โ†“
Streamlit Dashboard + FastAPI

๐Ÿ“‚ Project Structure

TeleSentry-AI/
โ”‚
โ”œโ”€โ”€ api/
โ”œโ”€โ”€ dashboard/
โ”œโ”€โ”€ data/
โ”œโ”€โ”€ notebooks/
โ”œโ”€โ”€ reports/
โ”œโ”€โ”€ saved_models/
โ”œโ”€โ”€ src/
โ”œโ”€โ”€ tests/
โ”‚
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ requirements-lock.txt
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ VERSION
โ””โ”€โ”€ .env.example

โš™๏ธ Features

Synthetic Telecom Dataset Generator

Generates realistic telecom profiles:

Legitimate Users

  • Delivery Partners
  • Business Users
  • Regular Subscribers
  • Traveling Professionals

Fraud Profiles

  • Digital Arrest Bots
  • Traditional Scammers
  • Low Volume Fraudsters

Feature Engineering

Generated telecom intelligence features:

Feature Description
call_intensity Calling activity level
distance_per_call Average call distance ratio
contact_circle_ratio Contact diversity ratio
delivery_pattern Delivery behavior pattern
high_freq_long_distance Suspicious high-volume calling

Rule Engine

Fraud intelligence layer:

  • Digital Arrest Detection
  • Mass Calling Detection
  • Long Distance Scam Detection
  • Traveler Detection
  • Business User Detection
  • Delivery Pattern Detection

Machine Learning Models

Isolation Forest

Purpose:

  • Unsupervised anomaly detection
  • Detection of unusual telecom behavior

Random Forest

Purpose:

  • Supervised fraud classification
  • Fraud probability estimation

๐Ÿ“Š Model Performance

Metric Score
Accuracy 98%+
Precision 97%+
Recall 98%+
F1 Score 98%+
ROC-AUC 99%+

๐Ÿง  Explainable AI

TeleSentry AI uses SHAP (SHapley Additive Explanations).

Generated explanations include:

  • SHAP Summary Plot
  • SHAP Waterfall Plot
  • Feature Importance Analysis

Top fraud indicators:

  • avgCallDistance
  • circleDiversity
  • call_intensity
  • avgDuration
  • high_freq_long_distance

๐Ÿ“ˆ Dashboard

Interactive Streamlit dashboard provides:

Dataset Overview

  • Dataset statistics
  • Fraud distribution
  • User type analysis
  • Operator analysis

Model Analytics

  • Accuracy
  • Precision
  • Recall
  • F1 Score
  • ROC Curve
  • Confusion Matrix

Live Fraud Prediction

Predict fraud risk using telecom activity metrics.

Rule Engine Analytics

Visualize fraud intelligence triggers.

SHAP Explainability

Interpret model decisions.


๐Ÿš€ FastAPI Backend

Endpoints:

Root

GET /

Health Check

GET /health

Prediction

POST /predict

Example Request:

{
  "avg_duration": 5,
  "call_frequency": 150,
  "unique_contacts": 100,
  "avg_distance": 600,
  "circle_diversity": 8
}

Example Response:

{
  "prediction": "FRAUD",
  "fraud_probability": 0.98,
  "risk_level": "CRITICAL"
}

๐Ÿ›  Installation

Clone Repository

git clone https://github.com/7vik2005/TeleSentry-AI.git

cd TeleSentry-AI

Install Dependencies

pip install -r requirements.txt

โ–ถ Running The Project

Generate Dataset

python -m src.data_generation.generator

Apply Rule Engine

python -m src.rule_engine.rules

Train Models

python -m src.models.random_forest

Generate SHAP Explanations

python -m src.explainability.shap_explainer

Launch Dashboard

python -m streamlit run dashboard/app.py

Launch API

python -m uvicorn api.app:app --reload

๐Ÿ“š Technologies Used

  • Python
  • Pandas
  • NumPy
  • Scikit-Learn
  • SHAP
  • FastAPI
  • Streamlit
  • Plotly
  • Matplotlib
  • Faker

๐Ÿ”ฎ Future Enhancements

  • XGBoost Integration
  • Real Telecom Data Support
  • Real-Time Streaming Detection
  • Docker Deployment
  • Cloud Deployment
  • Automated Retraining Pipeline
  • MLOps Integration

๐Ÿ‘จโ€๐Ÿ’ป Author

Satvik Jambagi

Machine Learning | Data Science | AI Engineering


๐Ÿ“œ License

This project is licensed under the MIT License.

About

Explainable AI-powered telecom fraud detection system using Random Forest, Isolation Forest, Rule-Based Intelligence, SHAP Explainability, FastAPI, and Streamlit Dashboard for real-time fraud risk assessment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors