This repository contains the backend code for the Breaking Bug event. The event is organized by IEEE Computer Society, Manipal University Jaipur.
Breaking Bug is an electrifying virtual showdown for tech enthusiasts and coding maestros! An exciting and challenging event where participants step into the shoes of skilled developers and problem-solvers! In this unique competition, their mission is to identify and fix bugs in a GitHub repository across three diverse domains: Frontend, Backend, and Machine Learning (ML).
Here are the pre-requisites in point form:
pandas numpy matplotlib seaborn plotly scikit-learn (metrics, preprocessing, model selection, ensemble) xgboost scipy
| Model | Cross-Validation Accuracy | Test Accuracy |
|---|---|---|
| Logistic Regression | 0.5169 | 0.4565 |
| Gradient Boosting | 0.6329 | 0.5543 |
| KNeighbors Classifier | 0.5785 | 0.5435 |
| Decision Tree Classifier | 0.6026 | 0.5435 |
| Random Forest Classifier | 0.6498 | 0.5652 |
| AdaBoost Classifier | 0.5652 | 0.4783 |
| XGBoost Classifier | 0.6352 | 0.5978 |
| Support Vector Machine | 0.5942 | 0.5000 |
| Naive Bayes Classifier | 0.3696 | 0.3043 |
Best Model: XGBClassifier Best Model Cross-Validation Accuracy: 0.3696 Best Model Test Accuracy: 0.5978
Here’s a revised summary focusing on the ML-related details:
Points Distribution
The maximum attainable points for this project are 1000. The points are distributed as follows:
| Difficulty Level | Points |
|---|---|
| Very easy | 20 |
| Easy | 30 |
| Medium | 40 |
| Hard | 75 |
| Easter egg | 100 |
| Total | 1000 |
Here are the columns from the dataset, with their descriptions:
Dataset Columns
- id: Unique ID
- age: Age in years
- sex: Gender
- dataset: Location of data collection
- cp: Chest pain type
- trestbps: Resting blood pressure
- chol: Cholesterol measure
- fbs: Fasting blood sugar
- restecg: ECG observation at resting condition
- thalch: Maximum heart rate achieved
- exang: Exercise induced angina
- oldpeak: ST depression induced by exercise relative to rest
- slope: The slope of the peak exercise ST segment
- ca: Number of major vessels (0-3) colored by fluoroscopy
- thal: Thalassemia
- num: Target [0 = no heart disease; 1, 2, 3, 4 = stages of heart disease]
Here’s the updated table without the "Best Hyperparameters" column:
Model Performance
| Model | Cross-Validation Accuracy | Test Accuracy |
|---|---|---|
| Logistic Regression | 0.5115 | 0.5109 |
| Gradient Boosting | 0.6396 | 0.5978 |
| KNeighbors Classifier | 0.5767 | 0.5870 |
| Decision Tree Classifier | 0.5840 | 0.5761 |
| AdaBoost Classifier | 0.6058 | 0.5978 |
| Random Forest | 0.6288 | 0.6739 |
| XGBoost Classifier | 0.6263 | 0.6413 |
| Support Vector Machine | 0.5877 | 0.5870 |
| Naive Bayes Classifier | 0.5780 | 0.5435 |
Best Model: XGBoost Classifier
Best Model Cross-Validation Accuracy: 0.6263
Best Model Test Accuracy: 0.6413

