This repository showcases SQL scripts and data visualizations created to analyze the Citi Bike dataset. The project demonstrates my skills in data cleaning, querying, and visualizing insights to support decision-making processes. It is structured to highlight my expertise in SQL, data analysis, and presenting actionable insights.
Contents Visualizations: Contains graphical representations of data insights, such as bike usage trends across stations. SQL Scripts: SQL queries for data extraction, cleaning, and analysis. Each query is well-documented with comments to explain its purpose. Datasets: Includes anonymized or sample datasets used in this analysis. Documentation: Detailed methodology and findings of the project, including insights and key takeaways. Project Overview This project focuses on analyzing bike-sharing data to uncover trends, identify key patterns, and provide actionable insights for operational efficiency.
Key Objectives: Analyze bike collection frequency at stations to determine resource allocation. Identify peak hours and underutilized stations. Provide visual summaries of station activity for stakeholders. Key Insights Busiest Stations: SQL queries revealed the top 5 stations with the highest bike collection frequency, helping optimize bike redistribution. Peak Hours Analysis: Identified timeframes with the highest bike usage to improve station readiness. Underutilized Stations: Highlighted areas with lower usage rates, offering opportunities for improvement or resource reallocation. Tools and Skills Used SQL: For data extraction, cleaning, and trend analysis. SAS: Used for preprocessing and further statistical analysis of the dataset. Visualization Tools: Bar charts and other visual summaries were created using Excel and Power BI. Visualizations Busiest Stations Chart: Displays bike collection frequency for each station. Peak Hour Analysis: Graphical representation of usage trends during different times of the day. Files available in the /Visualizations folder. SQL Scripts Scripts Available: Data Cleaning (data_cleaning.sql): Removes duplicates, fills missing values, and prepares data for analysis. Busiest Stations (busiest_stations.sql): Identifies the top stations based on bike collection frequency. Peak Hour Usage (peak_hour_analysis.sql): Analyzes hourly trends to determine peak times. Each script contains inline comments explaining its purpose and execution.

