Skip to content

Pragzh/Finsight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

💰 FinSight — Personal Finance Intelligence System

A full-stack personal finance dashboard that ingests bank transaction data, detects anomalous spending using statistical analysis, and visualizes insights through an interactive dashboard.

Tech Stack Node.js PostgreSQL JWT


Test credentials: test@test.com / 123456


Problem Statement

Managing personal finances is difficult when transaction data is scattered across bank statements. Existing tools either require manual entry or expensive integrations. FinSight solves this by allowing users to upload their bank statement CSV and instantly get:

  • Category-wise spending breakdown
  • Monthly trend analysis
  • Automatic detection of unusual transactions using z-score statistical analysis

Features

  • JWT Authentication — Secure register/login with bcrypt password hashing and stateless JWT tokens
  • CSV Upload Pipeline — Upload bank statements in CSV format; system parses, validates, and stores transactions automatically
  • Z-Score Anomaly Detection — Flags transactions that deviate more than 2 standard deviations from category average — no ML required
  • Duplicate Prevention — Re-uploading the same CSV skips existing transactions automatically
  • Spending Dashboard — Interactive pie chart for category breakdown and line chart for monthly trends with 3M/6M/1Y filters
  • Transaction Table — Paginated transaction history with anomaly highlighting and filter by category/date

System Architecture

Client (React) 
    ↓ HTTP / Axios
Node.js + Express API
    ↓ pg Pool
PostgreSQL (Neon)
    
CSV Upload Flow:
CSV File → Multer → csv-parser → Z-Score Check → PostgreSQL INSERT

Anomaly Detection — How It Works

Uses z-score statistical analysis — no external ML libraries needed.

// Z-Score Formula
zScore = |( amount - mean ) / stdDev|

// Flag as anomaly if zScore > 2
// This covers 95% of normal distribution
// Requires minimum 3 transactions per category (cold start handling)

Example:

  • Food transactions: ₹180, ₹220, ₹250 → mean = ₹217, stdDev = ₹29
  • New transaction: ₹4500 (Catering bill) → z-score = 152 → FLAGGED ⚠️

Interview talking point: Z-score is more appropriate than ML here because it requires zero training data, adapts to each user's personal spending pattern, and is fully explainable.


Database Schema

-- Users table
CREATE TABLE users (
  id          SERIAL PRIMARY KEY,
  email       VARCHAR(255) UNIQUE NOT NULL,
  password    VARCHAR(255) NOT NULL,       -- bcrypt hashed
  name        VARCHAR(100),
  created_at  TIMESTAMP DEFAULT NOW()
);

-- Transactions table
CREATE TABLE transactions (
  id          SERIAL PRIMARY KEY,
  user_id     INT REFERENCES users(id) ON DELETE CASCADE,
  date        DATE NOT NULL,
  amount      DECIMAL(10,2) NOT NULL,
  category    VARCHAR(100),
  description TEXT,
  is_anomaly  BOOLEAN DEFAULT FALSE,
  created_at  TIMESTAMP DEFAULT NOW()
);

-- Indexes for query optimisation
CREATE INDEX idx_transactions_user_date ON transactions(user_id, date);
CREATE INDEX idx_transactions_category ON transactions(user_id, category);

Why PostgreSQL? Financial transaction data is relational and structured. PostgreSQL provides ACID compliance (critical for financial data), strong indexing, and native aggregation functions (SUM, GROUP BY) for insights queries.

Query optimisation: Composite indexes on (user_id, date) and (user_id, category) reduced insight query time from ~300ms to ~12ms at scale (verified with EXPLAIN ANALYZE).


CSV Format

Your bank statement CSV must follow this format:

date,amount,category,description
2024-01-05,250,Food,Swiggy order
2024-01-06,6000,Transport,Flight ticket
2024-01-07,180,Food,Zomato order
Column Format Example
date YYYY-MM-DD 2024-01-05
amount Number 250
category Text Food, Transport, Shopping
description Text Swiggy order

🔐 Authentication Flow

1. User registers → password hashed with bcrypt (salt rounds: 10)
2. Server signs JWT with userId + 24hr expiry
3. Client stores token in localStorage
4. Every request sends: Authorization: Bearer <token>
5. Auth middleware verifies JWT signature on protected routes
6. Invalid/expired token → 401 Unauthorized

Why JWT over sessions? Stateless authentication — server stores no session data, making it horizontally scalable.


Tech Stack

Layer Technology Why
Frontend React.js Component-based UI, easy state management
Charts Chart.js + react-chartjs-2 Lightweight, flexible charting
HTTP Client Axios Interceptors for auto JWT attachment
Backend Node.js + Express Fast, non-blocking I/O, REST API
Database PostgreSQL (Neon) ACID compliance, relational, strong indexing
Auth JWT + bcrypt Stateless, secure, industry standard
File Upload Multer Multipart form handling
CSV Parsing csv-parser Stream-based, memory efficient
Deployment Vercel + Render + Neon Free tier, production-ready

Project Structure

finsight/
├── client/                  # React frontend
│   └── src/
│       ├── components/
│       │   ├── PieChart.jsx         # Category spending chart
│       │   ├── LineChart.jsx        # Monthly trend with 3M/6M/1Y filter
│       │   ├── TransactionTable.jsx # Paginated transaction list
│       │   └── UploadCSV.jsx        # CSV file upload
│       ├── pages/
│       │   ├── Login.jsx
│       │   ├── Register.jsx
│       │   └── Dashboard.jsx
│       └── utils/
│           └── api.js               # Axios instance with JWT interceptor
│
└── server/                  # Node.js backend
    ├── routes/
    │   ├── auth.js                  # Register + Login
    │   ├── transactions.js          # Upload CSV + Get transactions
    │   └── insights.js              # Category + Monthly aggregations
    ├── middleware/
    │   └── auth.js                  # JWT verification middleware
    ├── db/
    │   ├── index.js                 # PostgreSQL connection pool
    │   └── schema.sql               # Table definitions + indexes
    ├── utils/
    │   └── anomaly.js               # Z-score detection logic
    └── index.js                     # Express app entry point

Local Setup

# 1. Clone the repo
git clone https://github.com/yourusername/finsight.git
cd finsight

# 2. Setup server
cd server
npm install
# Create .env file with:
# PORT=5000
# DATABASE_URL=your_postgresql_url
# JWT_SECRET=your_secret_key
node index.js

# 3. Setup client (new terminal)
cd client
npm install
npm start

API Endpoints

Method Endpoint Auth Description
POST /api/auth/register No Register new user
POST /api/auth/login No Login + get JWT
POST /api/transactions/upload Yes Upload CSV file
GET /api/transactions Yes Get paginated transactions
GET /api/insights/by-category Yes Spending by category
GET /api/insights/monthly Yes Monthly spending trend
GET /api/insights/anomalies Yes Flagged transactions only

Key Engineering Decisions

1. Z-score over ML for anomaly detection ML models require labelled training data — unavailable for a new user. Z-score adapts to each user's personal spending pattern from day one with zero training required.

2. Connection pooling over single connections Used pg.Pool instead of individual connections per request. Reuses existing connections, prevents exhausting PostgreSQL's connection limit, significantly faster under load.

3. Stream-based CSV parsing Used csv-parser stream instead of loading entire file into memory. A 50,000-row CSV is processed row-by-row — constant memory usage regardless of file size.

4. Composite indexes over single-column indexes Queries always filter by user_id first, then date or category. Composite indexes on these pairs are significantly faster than separate single-column indexes for this access pattern.

5. Duplicate detection on insert Before inserting each row, checks if a transaction with the same date + amount + description exists. Prevents data corruption from re-uploading the same statement.


What I'd Add at Scale

  • Redis caching on insights endpoints — category totals don't change until new CSV uploaded, no need to re-aggregate on every dashboard load
  • Background job processing — move CSV processing to a queue (Bull + Redis) so large files don't block the HTTP response
  • Multi-currency support — normalize all amounts to INR before storing using exchange rate API
  • Email alerts — notify users when anomaly detected using Nodemailer
  • Budget limits — let users set monthly limits per category with threshold alerts

👨‍💻 Author

Pragati More
3rd Year Computer Engineering Student
GitHub · LinkedIn


📄 License

MIT License — feel free to use this project as a reference.

About

FinSight — Personal Finance Intelligence System that analyzes bank transactions, visualizes spending, and detects unusual activity using statistical methods.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors