🏦 Bank Marketing Data Cleaning

Cleaning, reformatting, and splitting bank marketing campaign data for PostgreSQL ingestion.

📖 Background

Personal loans are a lucrative revenue stream for banks. The typical interest rate of a two-year loan in the United Kingdom is around 10%. In September 2022 alone, UK consumers borrowed around £1.5 billion — approximately £300 million in interest generated by banks over two years.

This project cleans data collected from a bank's recent marketing campaign aimed at getting customers to take out a personal loan. The cleaned data is structured to be imported into a PostgreSQL database, enabling future campaigns to reuse the same schema seamlessly.

📁 Project Structure

├── bank_marketing.csv      # Raw input data
├── notebook.ipynb          # Main Jupyter Notebook
├── client.csv              # Output: client demographic data
├── campaign.csv            # Output: campaign interaction data
└── economics.csv           # Output: economic indicator data

🎯 Objective

Clean, reformat, and split bank_marketing.csv into three separate CSV files:

File	Description
`client.csv`	Client demographic and financial profile
`campaign.csv`	Campaign contact history and outcomes
`economics.csv`	Macroeconomic indicators at time of contact

🗂️ Output Schema

`client.csv`

Column	Type	Description	Cleaning
`client_id`	`integer`	Client ID	—
`age`	`integer`	Age in years	—
`job`	`object`	Type of job	Replace `"."` → `"_"`
`marital`	`object`	Marital status	—
`education`	`object`	Education level	Replace `"."` → `"_"`, `"unknown"` → `NaN`
`credit_default`	`bool`	Credit in default?	`1` if `"yes"`, else `0`
`mortgage`	`bool`	Has housing loan?	`1` if `"yes"`, else `0`

`campaign.csv`

Column	Type	Description	Cleaning
`client_id`	`integer`	Client ID	—
`number_contacts`	`integer`	Contacts in current campaign	—
`contact_duration`	`integer`	Last contact duration (seconds)	—
`previous_campaign_contacts`	`integer`	Contacts in previous campaign	—
`previous_outcome`	`bool`	Previous campaign outcome	`1` if `"success"`, else `0`
`campaign_outcome`	`bool`	Current campaign outcome	`1` if `"yes"`, else `0`
`last_contact_date`	`datetime`	Last contact date	Built from `day` + `month` + `2022`, format: `YYYY-MM-DD`

`economics.csv`

Column	Type	Description	Cleaning
`client_id`	`integer`	Client ID	—
`cons_price_idx`	`float`	Consumer price index	—
`euribor_three_months`	`float`	Euribor 3-month rate	—

🧹 Solution

import pandas as pd
import numpy as np

df = pd.read_csv("bank_marketing.csv")

# ── client.csv ────────────────────────────────────────────────────────────────
client = df[["client_id", "age", "job", "marital", "education",
             "credit_default", "mortgage"]].copy()

client["job"]       = client["job"].str.replace(".", "_", regex=False)
client["education"] = (client["education"]
                       .str.replace(".", "_", regex=False)
                       .replace("unknown", np.NaN))

client["credit_default"] = client["credit_default"].apply(lambda x: 1 if x == "yes" else 0).astype(bool)
client["mortgage"]       = client["mortgage"].apply(lambda x: 1 if x == "yes" else 0).astype(bool)

client.to_csv("client.csv", index=False)

# ── campaign.csv ──────────────────────────────────────────────────────────────
campaign = df[["client_id", "number_contacts", "contact_duration",
               "previous_campaign_contacts", "previous_outcome",
               "campaign_outcome", "day", "month"]].copy()

campaign["previous_outcome"] = campaign["previous_outcome"].apply(lambda x: 1 if x == "success" else 0).astype(bool)
campaign["campaign_outcome"] = campaign["campaign_outcome"].apply(lambda x: 1 if x == "yes" else 0).astype(bool)

campaign["last_contact_date"] = pd.to_datetime(
    campaign["day"].astype(str) + "-" + campaign["month"] + "-2022",
    format="%d-%b-%Y"
).dt.strftime("%Y-%m-%d")

campaign = campaign.drop(columns=["day", "month"])

campaign.to_csv("campaign.csv", index=False)

# ── economics.csv ─────────────────────────────────────────────────────────────
economics = df[["client_id", "cons_price_idx", "euribor_three_months"]].copy()

economics.to_csv("economics.csv", index=False)

⚙️ How to Run

Clone the repository

git clone https://github.com/your-username/bank-marketing-cleaning.git
cd bank-marketing-cleaning

Install dependencies
```
pip install pandas numpy
```

Run the notebook or script

jupyter notebook notebook.ipynb
# or
python solution.py

Check outputs — client.csv, campaign.csv, and economics.csv will be generated in the project root.

🔍 Key Transformations

Boolean encoding — "yes" / "success" → True (1), all others → False (0)
String cleaning — dots replaced with underscores in job and education fields
Null handling — "unknown" education values replaced with NaN
Date construction — last_contact_date assembled from separate day, month columns with year fixed at 2022

🛠️ Tech Stack

Python 3.8+
pandas — data manipulation and CSV I/O
NumPy — NaN handling

📜 License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏦 Bank Marketing Data Cleaning

📖 Background

📁 Project Structure

🎯 Objective

🗂️ Output Schema

`client.csv`

`campaign.csv`

`economics.csv`

🧹 Solution

⚙️ How to Run

🔍 Key Transformations

🛠️ Tech Stack

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
bank_marketing.csv		bank_marketing.csv
campaign.csv		campaign.csv
client.csv		client.csv
economics.csv		economics.csv
notebook.ipynb		notebook.ipynb
piggy_bank.jpg		piggy_bank.jpg

Folders and files

Latest commit

History

Repository files navigation

🏦 Bank Marketing Data Cleaning

📖 Background

📁 Project Structure

🎯 Objective

🗂️ Output Schema

client.csv

campaign.csv

economics.csv

🧹 Solution

⚙️ How to Run

🔍 Key Transformations

🛠️ Tech Stack

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`client.csv`

`campaign.csv`

`economics.csv`

Packages