Skip to content

PartowRoshani/fraud-detection-ppr

Repository files navigation

fraud-detection-ppr

Introduction

This project is the final assignment for the Data structures and algorithems course at Shahid Beheshti University. The project implements a complete Personalized PageRank (PPR) analysis framework for large-scale graph datasets using C++ as the computational core and Python (Tkinter) for an interactive graphical interface.

The system is designed to support both batch experimental evaluation and interactive single-run analysis, enabling reproducible research as well as real-time demonstration of algorithmic behavior.

Objectives

The main goals and concepts practiced in this project include:

Advanced C++ programming and performance-oriented design

Graph algorithms and Personalized PageRank

Iterative numerical methods (power iteration)

Deterministic randomization and reproducibility

Handling large-scale graph datasets

Experimental evaluation and metric design

Scalability analysis using induced subgraphs

Separation of concerns (backend vs UI)

Interfacing C++ backend with a Python GUI

Structured data exchange using JSON and CSV

Pre-requirements

To build and run the project, the following tools are required:

C++ compiler supporting C++17 (g++, MinGW, or clang)

Python 3.9 or higher

Tkinter (included by default with Python)

VS Code or any C++-capable IDE

Operating System: Windows / Linux / macOS

Project Architecture

The project is intentionally divided into two execution modes:

1. Backend (Batch Mode)

Fully automated experimental pipeline

Processes multiple datasets, α values, and K values

Produces structured CSV files for analysis and plotting

Used for performance benchmarking and report generation

2. UI Mode (Interactive Mode)

Single-run execution controlled by a graphical interface

Generates a clean JSON output

Designed for live demonstrations and exploration

Does not duplicate algorithmic logic

#Implementation Overview

The project consists of the following main components:

PPR Core (C++)

Implements Personalized PageRank using power iteration

Supports weighted and unweighted graphs

Handles dangling nodes and teleportation correctly

Deterministic seed selection:

From ground-truth nodes when available

Random selection otherwise

Multiple evaluation metrics:

Precision@K (with seeds)

Precision@K (excluding seeds)

SeedPrecision@K (GT-free datasets)

Convergence tracking and iteration limits

Subgraph sampling for scalability evaluation

Backend Experimental Pipeline

Runs experiments across:

Multiple datasets

Multiple α values

Multiple K values

Full graph and sampled subgraphs

Outputs results into:

run_summary.csv

alpha_sensitivity.csv

precision.csv

scalability.csv

These outputs are designed for direct use in academic plots and reports.

Graphical User Interface (Python Tkinter)

Provides an interactive control panel for:

Dataset selection

Full graph vs subset execution

Alpha (α) adjustment via slider

K selection

Displays:

Runtime statistics

Convergence status

Iteration count

Precision metrics

Top-K ranked nodes

Communicates with the C++ core using a JSON-based interface

Contains no algorithmic logic (pure frontend)

Project Structure

project

│ ├─ ppr_runner.cpp # Main C++ implementation

├─ graph.h / graph.cpp # Graph representation and loader

├─ ui.py # Tkinter-based GUI

├─ data/ # Graph datasets

├─ results/ # CSV outputs (batch mode)

├─ result.json # JSON output (UI mode)

└─ README.md

How to Build and Run

Compile the Backend g++ -std=c++17 ppr_runner.cpp graph.cpp -O2 -o ppr_runner

Run Backend (Batch Mode)

ppr_runner.exe

This generates all CSV files under the results/ directory.

Run Backend in UI Mode (Single Run)

ppr_runner.exe --ui
--dataset bitcoin_alpha
--mode subset1
--alpha 0.15
--k 20
--out result.json

Launch Graphical User Interface

python ui.py

The GUI will open and allow interactive execution of the PPR algorithm.

Design Rationale

Separation of Concerns: The algorithmic core is completely isolated from the user interface.

Reproducibility: Deterministic random seeds ensure consistent experimental results.

Scalability: Subgraph sampling allows controlled evaluation of runtime growth.

Extensibility: New datasets, metrics, or UI features can be added without modifying the core logic.

Presentation

image image

Course Instructor:

Dr. Ali Katanforoush

- Team Members:

Date / Version

Winter 2026 Version 1.0

Resources :

PageRank and Personalized PageRank literature

C++17 Standard Documentation

Python Tkinter Documentation

Graph Mining and Network Analysis References

About

Fraud detection in transaction networks using Personalized PageRank (PPR), implemented from scratch with sparse graph representations and real-world datasets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors