Skip to content

inf14/Unfinished-Business

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚽ Unfinished Business

An Arsenal FC Data Research Project — Can They Finally Win the Premier League in 2025/26?


📌 Overview

Three consecutive runner-up finishes. 84 points. 89 points. A squad built over years under Mikel Arteta. The numbers say Arsenal are close — but close isn't the title.

Unfinished Business is a data-driven research project that combines football analytics + machine learning to study Arsenal's Premier League performance from the 2022–23 season through the ongoing 2025–26 campaign.

Is the 2025/26 season finally Arsenal's?

🎯 Project Goals

# Goal
1 Build a structured multi-season match dataset
2 Perform statistical and ML-based performance analysis
3 Identify key factors influencing wins/losses
4 Compare multiple ML models for prediction
5 Extend to player-level tactical insights

📊 Dataset

The dataset contains match-level structured data across 4 seasons.

📁 Files

  • arsenal_22-23.csv
  • arsenal_23-24.csv
  • arsenal_24-25.csv
  • arsenal_25-26.csv

🔍 Key Features

Feature Description
season Season
gw Gameweek
opponent Opponent team
venue Home/Away
goals_for Goals scored
goals_against Goals conceded
result W/D/L
points Match points
opp_table_position Opponent rank
opp_strength_bucket Strength category

⚙️ Machine Learning Pipeline

🔹 Data Preprocessing

  • Missing values handled
  • Created features:
    • win (target variable)
    • goal_diff
    • is_home

🔹 Feature Engineering

  • Rolling form (form_last5)
  • Opponent encoding
  • Strength encoding

Final features used:

form_last5, is_home, opponent_encoded, strength_encoded, opp_table_position

🤖 Models Used

  • Logistic Regression
  • Decision Tree
  • Random Forest
  • KNN
  • SVM

📈 Results & Visualizations

Graphs are stored in outputs/

📊 Model Accuracy

Model Accuracy

🌳 Feature Importance

Feature Importance

⚔️ Win Rate vs Opponent Strength

Win vs Strength

🏟️ Home vs Away Performance

Home vs Away


🔍 Key Insights

  • Recent form (last 5 matches) is the strongest predictor
  • Home advantage significantly impacts results
  • Performance drops against stronger opponents
  • Matches cluster into distinct performance patterns

🚀 How to Run

pip install -r requirements.txt
python main.py


🚀 Future Work

This project will be extended with a player-level dataset, enabling:

  • Player impact modeling
  • Best XI combinations
  • Tactical pattern extraction
  • Advanced prediction models

⭐ Author

Anant Jain


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages