A self-hosted bank-statement analyzer for Saudi National Bank (SNB / AlAhli)
exports. Upload the transactionTable_*.xls you download from the web banking
portal and get back the numbers you actually care about — average monthly
income and spending, lowest and highest balance during the period, a clear
breakdown of where money came from and where it went, salary detection, top
spending categories and merchants, an optional visa eligibility check, and
a per-transaction notes field so you can annotate every row before submitting
your statement to a consulate.
Privacy: uploads are processed in memory and never written to disk by the web app. Notes are stored in your browser's
localStorageand never leave your machine. Run the analyzer locally; don't host it on a public server.
This is a near-rewrite of the original Analysis-TransactionTable-SNB project, keeping the same intent but with a real parser, categoriser, analytics layer, Flask UI and tests.
- Format-tolerant parser. Locates the header row by content rather than hard-coded index, drops JasperReports spacer columns, and reconstructs missing amounts from balance deltas — useful both for the bank's micro-fee rows and for users who redact amounts before sharing samples.
- Two-tier categoriser. MCC (Merchant Category Code) lookup first (Visa/Mastercard ranges), with English + Arabic keyword fallback for rows without an MCC.
- Salary detection. Two strategies in order: explicit
Payroll deposittype match, then pattern-based detection (recurring positive deposits clustered on a similar day-of-month and amount). - Money-flow breakdown. A two-column section showing every credit bucketed by source (Salary, Cash deposit, Refund, named transfers in, generic transfers in) and every debit bucketed by destination (ATM withdrawals, Government & SADAD bills, Bank fees & VAT, named transfers out, card spending grouped by category). Each bucket shows the total, transaction count, share-of-total bar, and date range.
- Per-transaction notes. A full searchable transactions table where
every row has an editable note field. Notes save to your browser's
localStorageinstantly and persist across page reloads. You can export all notes to a JSON file and re-import them on another machine. - Smart top merchants table. Raw merchant strings are noisy
(
barqvsBarq,STC BANKvsSTC Pay,PANDA RETAIL COvs10010 Panda 10010). The analyzer normalises them to a brand key — stripping POS terminal IDs, generic suffixes (CO, RETAIL, BANK, PAY), and case differences — so the same brand always appears as a single row in the top-merchants table. - Patterns we couldn't categorize section. A diagnostic block at the bottom of the dashboard listing every pattern that fell through to "Other" / "Card spending — Uncategorised", sorted by transaction count so the highest-leverage rule sits at the top. You can either type a category for each pattern (saved to your browser's localStorage, with the matching transactions in the table re-tagged immediately), or export the patterns as JSON to send back as a feature request.
- Bright, clean dashboard with four hero cards (avg income, avg spending, lowest balance, highest balance), Chart.js charts (monthly bars, balance line, category doughnut), top-merchants table and monthly breakdown table.
- Visa eligibility modal. A single "Check visa eligibility" button
opens a focused popup with:
- Genuine-activity score with a colored progress bar
- Months with salary (e.g. 4 / 7) and salary consistency score
- Lowest 28-day rolling-window minimum balance
- Optional threshold compliance check
- Largest unexplained credit (if any)
- Six-item findings checklist
- CLI for scripting:
python run.py --cli statement.xls. - 56 unit + integration tests covering parsing, categorisation, flow analysis, salary detection, visa metrics and Flask endpoints.
git clone <this-repo>
cd bank-statement-analyzer
python -m venv .venv
# Activate (Linux / macOS):
source .venv/bin/activate
# Activate (Windows / Git Bash):
source .venv/Scripts/activate
pip install -r requirements.txt
# Run tests
pytest
# Launch the web UI
python run.py
# -> open http://127.0.0.1:5000
# Or run the CLI
python run.py --cli samples/sample_statement.xls --threshold 10000A redacted sample statement ships in samples/sample_statement.xls so you
can see what the dashboard looks like without uploading your own data.
- Log into SNB's web banking portal.
- Open Account Statement for the account you want to analyse.
- Pick a date range — for visa applications, choose the period the consulate requires (commonly 3 to 6 months ending on the submission date).
- Choose English as the language.
- Click Export to Excel (filename will be
transactionTable_DD-MM-YYYY_en.xls). - Drop that file into the analyzer.
bank-statement-analyzer/
├── app/
│ ├── __init__.py # public `analyse()` entry point
│ ├── pipeline.py # parse + categorise + analyse pipeline
│ ├── parser/
│ │ ├── __init__.py
│ │ └── snb_parser.py # bank-format-specific Excel parser
│ ├── analytics/
│ │ ├── __init__.py
│ │ ├── categorizer.py # MCC + keyword categorisation
│ │ ├── monthly.py # monthly / category / merchant aggregations
│ │ ├── flow.py # income-source / outgoing-destination buckets
│ │ ├── salary.py # salary detection
│ │ └── visa.py # visa-friendly compliance report
│ └── web/
│ ├── __init__.py # Flask app factory + routes
│ └── templates/
│ ├── base.html # light theme, modal styles
│ ├── upload.html
│ └── dashboard.html # hero cards + flow + transactions + visa modal
├── tests/ # 56 tests
├── samples/sample_statement.xls
├── run.py # CLI + Flask launcher
├── requirements.txt
└── README.md
For each transaction we apply, in order:
- MCC lookup. Descriptions like
... CITY:0000682016SAM MCC-5814 ...are mapped via the official Visa/Mastercard MCC table to one of 16 categories (Restaurants, Groceries, Transport & Fuel, Telecom, Shopping, Travel, etc.). - Transaction-type keyword. SNB uses very consistent labels —
Payroll deposit,Cash withdrawal - Local ATM,VAT Charge,SADAD Payment— so this layer is highly reliable for SNB-internal actions. - Description keyword. English + Arabic substrings for common merchants (LULU, Tamimi, Carrefour, STC, Mobily, Saudia, نهدي, مطاعم …).
- Amount-sign fallback. Positive →
Transfer In, otherwiseOther.
Wallet detection (digital-wallet brands like STC Pay, Barq, D360, urpay, Tabby, Tamara, plus SamsungPay / ApplePay / GooglePay payment methods) runs in a priority pass before transaction-type rules, so a transfer labelled "Outgoing IPS local transfer" with "STC PAY" in the description correctly classifies as Wallet rather than as a generic transfer-out.
Categorisation accuracy is highest on un-redacted statements where the MCC codes are intact. On the redacted sample bundled with this repo, ~90% of rows are now categorised correctly — only the 3 "Local Internet purchase" rows lack any signal because their descriptions were stripped.
The Money Flow section answers two questions a visa officer typically asks:
Where did this money come from?
| Bucket | What lands here |
|---|---|
| Salary | Anything matching Payroll deposit / Salary |
| Cash deposit | ATM and over-the-counter cash deposits |
| Refund | Refunds and reversals |
Transfer from <name> |
Local / international transfers where the sender's name is in the description |
| Transfer in (other) | Generic incoming transfers |
| Other income | Anything else positive |
Where did this money go?
| Bucket | What lands here |
|---|---|
| ATM withdrawals | Cash withdrawals from ATMs |
| Government & SADAD bills | SADAD-routed payments (utilities, traffic fines, government fees) |
| Bank fees & VAT | Service charges, VAT charges, transfer fees |
Transfer to <name> |
Outgoing transfers where the recipient's name is in the description |
Card spending — <category> |
POS purchases bucketed by their category (Groceries, Restaurants, Transport, etc.) |
| Other outgoing | Anything else negative |
Each bucket shows the total amount, transaction count, share-of-total percentage with a colored bar, and the date range covered.
The "All transactions" section shows every row in the statement (not just
a preview) with a search box and a notes column. Click in any note field
and start typing — the note saves to your browser's localStorage the
moment you click out of the field. A short green flash confirms the save.
This is intentionally not stored on the server. The analyzer never sees your notes; they live on your machine only.
Consulates often ask follow-up questions about specific transactions ("What was this 50,000 SAR deposit on March 5th?"). Annotating each row in advance lets you build a paper trail you can hand to your visa agent or attach to your application bundle:
- "Sale of car — see attached transfer agreement"
- "Refund from Almosafer for cancelled flight"
- "Tuition payment for sister, Hawra — see receipt"
Two buttons next to the "All transactions" heading:
- Export notes (JSON) downloads a file containing every note keyed by transaction ID and a fingerprint identifying the statement. Good for backups or for moving notes between devices.
- Import notes loads notes from a previously exported file. If the fingerprint doesn't match the current statement, you'll be warned — notes for transactions that don't exist in the loaded statement are ignored.
Click "Check visa eligibility" in the dashboard header to open the modal. The metrics shown are intentionally generic — they don't encode any specific country's rules. You supply the threshold relevant to your application.
| Metric | Why it matters |
|---|---|
| Period covered (days) | Most consulates expect 3-6 months continuous history. |
| Lowest 28-day window minimum | The lowest daily balance over any 28-consecutive-day window. Many maintenance-funds rules check against this, not just the closing balance. |
| Threshold compliance | If you supply a threshold, we tell you whether every day of every 28-day window stayed above it. |
| Months with salary (e.g. 4 / 7) | How many of the months in the period contained a salary deposit. |
| Salary consistency score | Months with salary divided by months covered, expressed as a percentage. |
| Largest unexplained credit | A single large non-salary deposit shortly before submission is a classic red flag — we surface the largest one ≥ 3× median monthly income. |
| Genuine-activity score | Composite of coverage, salary regularity, transaction count, and the absence of suspicious credits. |
The threshold field on the upload form takes a plain number in the statement currency. You can change the currency label, but no FX conversion happens — convert your target threshold yourself before entering it.
Disclaimer. These metrics help you self-check before submitting. They are not a substitute for reading the official requirements of the visa you're applying for, or for advice from a qualified immigration adviser.
- Detect more category aliases (richer English + Arabic merchant lists)
- Support import wallets like Barq, STC Bank and D360 (different export formats)
- Threshold currency conversion using a published FX rate
- PDF report export of the dashboard
- User-defined category overrides
- Expense forecasting (12-week extrapolation)
- Multi-statement merge (combine statements from multiple accounts)
If you have a statement that fails to parse, please open an issue
with a redacted sample: replace amounts and descriptions with XXXX
but leave the balance column intact so the parser still reconciles.
Run the test suite before submitting a PR:
pytest -v