A machine learning project that predicts the outcome of professional soccer matches
(Home Win / Draw / Away Win) using historical match data and engineered performance metrics.
This project builds a supervised classification model to predict match results based on:
- Team form (rolling averages)
- Goals scored & conceded
- Home vs Away performance
- Head-to-head statistics
- Engineered rating features (e.g., goal differential, Elo rating)
The objective is to explore feature engineering, model selection, and evaluation in a real-world sports analytics setting.
Given two teams and their historical performance data, predict the match outcome:
- 🟢 Home Win
- 🟡 Draw
- 🔵 Away Win
This is a multi-class classification problem.
The dataset contains:
- Match date
- Home team / Away team
- Full-time goals
- Shots, shots on target
- Possession (if available)
- Rolling performance statistics
- Engineered features
Possible data sources:
- Kaggle soccer datasets
- API-Football
- Python
- Pandas
- NumPy
- Scikit-learn
- Matplotlib / Seaborn
Examples of features used:
- Rolling average goals scored (last 5 matches)
- Rolling average goals conceded
- Goal differential
- Home win percentage
- Away win percentage
- Head-to-head win ratio
- Elo rating difference (optional)
- Data Cleaning
- Feature Engineering
- Train/Test Split
- Model Training
- Cross-Validation
- Evaluation & Comparison
Since this is a multi-class classification problem:
- Accuracy
- Precision
- Recall
- F1 Score
- Confusion Matrix