This project contains an end-to-end Exploratory Data Analysis (EDA) script written in SQL. The goal of this analysis is to evaluate a travel agency's internal operations, assess financial accuracy, and uncover macro-level business trends. By transforming, cleaning, and aggregating project data, this script prepares raw information for strategic decision-making and business intelligence (BI) dashboarding.
- Source: Kaggle
- Dataset: Travel Agency Project Data (Imported via CSV)
- Language: SQL (MySQL dialect)
- Techniques Utilized: Data Type Casting, Data Cleaning, Time Series Aggregation, Cross-Tabulation, Financial Variance Analysis, Standard Deviation (Risk Assessment).
- Downstream Tools: Designed to output clean datasets for Tableau or Power BI.
The SQL script (Travel_agency_operations_eda.sql) is broken down into 10 distinct phases:
- Database Setup & Data Import: Establishing the schema and loading the CSV data.
- Data Understanding: Initial volume checks and row counts.
- Data Preparation & Cleaning: Converting generic strings to
DATEandDECIMALtypes, checking for NULLs, and verifying data integrity against duplicates. - Univariate Analysis: Analyzing the distributions of individual variables like Project Type and Project Status.
- Bivariate & Multivariate Analysis: Examining relationships, such as total revenue generated by different departments and revenue spreads across project roles.
- Time Series Analysis: Tracking project volume and revenue growth over years, as well as identifying peak seasonal months.
- Cross-Tabulation & Deep Dive Categorical Analysis: Deep diving into categorical intersections, like finding the most lucrative project types per audience segment, and identifying operational bottlenecks by mapping project statuses to specific departments.
- Financial Accuracy Metrics: Comparing actual vs. estimated revenues to determine forecast accuracy across different project categories and target audiences.
- Advanced Time Series: Rolling data up into standard fiscal quarters to assess macro-level business cycles.
- Risk Assessment: Calculating the Standard Deviation of revenue by project type to identify the financial volatility/unpredictability of specific offerings.
- Which departments are driving the most revenue?
- What times of the year represent our peak seasons and highest revenue-generating months?
- How accurately is the agency forecasting revenue, and which teams are over/under-predicting?
- Which project types carry the highest financial risk (volatility)?
- Are there operational bottlenecks within specific departments based on project status pipelines?
- Clone this repository to your local machine.
- Open your preferred SQL IDE (e.g., MySQL Workbench).
- Run the initial
CREATE DATABASE travel_agency_db;command. - Use the Table Data Import Wizard** to import your Kaggle CSV file into a table named
travel_agency_projects. - Execute the queries sequentially to clean the data and generate insights.
- Export the resulting tables to Tableau or Power BI for visualization.