Skip to content

dariusmark-tech/Cleaning-Bank-Marketing-Campaign-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏦 Bank Marketing Data Cleaning

Python Pandas NumPy License

Cleaning, reformatting, and splitting bank marketing campaign data for PostgreSQL ingestion.


📖 Background

Personal loans are a lucrative revenue stream for banks. The typical interest rate of a two-year loan in the United Kingdom is around 10%. In September 2022 alone, UK consumers borrowed around £1.5 billion — approximately £300 million in interest generated by banks over two years.

This project cleans data collected from a bank's recent marketing campaign aimed at getting customers to take out a personal loan. The cleaned data is structured to be imported into a PostgreSQL database, enabling future campaigns to reuse the same schema seamlessly.


📁 Project Structure

├── bank_marketing.csv      # Raw input data
├── notebook.ipynb          # Main Jupyter Notebook
├── client.csv              # Output: client demographic data
├── campaign.csv            # Output: campaign interaction data
└── economics.csv           # Output: economic indicator data

🎯 Objective

Clean, reformat, and split bank_marketing.csv into three separate CSV files:

File Description
client.csv Client demographic and financial profile
campaign.csv Campaign contact history and outcomes
economics.csv Macroeconomic indicators at time of contact

🗂️ Output Schema

client.csv

Column Type Description Cleaning
client_id integer Client ID
age integer Age in years
job object Type of job Replace ".""_"
marital object Marital status
education object Education level Replace ".""_", "unknown"NaN
credit_default bool Credit in default? 1 if "yes", else 0
mortgage bool Has housing loan? 1 if "yes", else 0

campaign.csv

Column Type Description Cleaning
client_id integer Client ID
number_contacts integer Contacts in current campaign
contact_duration integer Last contact duration (seconds)
previous_campaign_contacts integer Contacts in previous campaign
previous_outcome bool Previous campaign outcome 1 if "success", else 0
campaign_outcome bool Current campaign outcome 1 if "yes", else 0
last_contact_date datetime Last contact date Built from day + month + 2022, format: YYYY-MM-DD

economics.csv

Column Type Description Cleaning
client_id integer Client ID
cons_price_idx float Consumer price index
euribor_three_months float Euribor 3-month rate

🧹 Solution

import pandas as pd
import numpy as np

df = pd.read_csv("bank_marketing.csv")

# ── client.csv ────────────────────────────────────────────────────────────────
client = df[["client_id", "age", "job", "marital", "education",
             "credit_default", "mortgage"]].copy()

client["job"]       = client["job"].str.replace(".", "_", regex=False)
client["education"] = (client["education"]
                       .str.replace(".", "_", regex=False)
                       .replace("unknown", np.NaN))

client["credit_default"] = client["credit_default"].apply(lambda x: 1 if x == "yes" else 0).astype(bool)
client["mortgage"]       = client["mortgage"].apply(lambda x: 1 if x == "yes" else 0).astype(bool)

client.to_csv("client.csv", index=False)

# ── campaign.csv ──────────────────────────────────────────────────────────────
campaign = df[["client_id", "number_contacts", "contact_duration",
               "previous_campaign_contacts", "previous_outcome",
               "campaign_outcome", "day", "month"]].copy()

campaign["previous_outcome"] = campaign["previous_outcome"].apply(lambda x: 1 if x == "success" else 0).astype(bool)
campaign["campaign_outcome"] = campaign["campaign_outcome"].apply(lambda x: 1 if x == "yes" else 0).astype(bool)

campaign["last_contact_date"] = pd.to_datetime(
    campaign["day"].astype(str) + "-" + campaign["month"] + "-2022",
    format="%d-%b-%Y"
).dt.strftime("%Y-%m-%d")

campaign = campaign.drop(columns=["day", "month"])

campaign.to_csv("campaign.csv", index=False)

# ── economics.csv ─────────────────────────────────────────────────────────────
economics = df[["client_id", "cons_price_idx", "euribor_three_months"]].copy()

economics.to_csv("economics.csv", index=False)

⚙️ How to Run

  1. Clone the repository

    git clone https://github.com/your-username/bank-marketing-cleaning.git
    cd bank-marketing-cleaning
  2. Install dependencies

    pip install pandas numpy
  3. Run the notebook or script

    jupyter notebook notebook.ipynb
    # or
    python solution.py
  4. Check outputsclient.csv, campaign.csv, and economics.csv will be generated in the project root.


🔍 Key Transformations

  • Boolean encoding"yes" / "success"True (1), all others → False (0)
  • String cleaning — dots replaced with underscores in job and education fields
  • Null handling"unknown" education values replaced with NaN
  • Date constructionlast_contact_date assembled from separate day, month columns with year fixed at 2022

🛠️ Tech Stack

  • Python 3.8+
  • pandas — data manipulation and CSV I/O
  • NumPy — NaN handling

📜 License

This project is licensed under the MIT License.

About

Data cleaning is an essential skill for data engineers, encompassing reading, modifying, splitting, and storing data. In this notebook, data-cleaning skills to process information about marketing campaigns run by a bank.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors