For this project, the Heart Failure Clinical Records dataset (source) and the Mice Protein Expression dataset (source) were selected for analysis. The first dataset contains clinical and epidemiological records of 299 patients who experienced heart failure, including their clinicopathological characteristics. The second dataset comprises expression levels of 77 proteins measured in the cerebral cortex of eight classes of mice, including control (n=38) and Down syndrome (n=34) groups, exposed to context fear conditioning.
The primary objectives of this study were:
- Implementation of a classification pipeline
- Comprehensive data analysis and result interpretation
The classification pipeline consisted of the following stages: i) Preprocessing, including handling of missing values (if applicable) and other relevant transformations. ii) Model comparison, where different classification algorithms were evaluated. iii) Optimization, involving hyperparameter tuning to enhance the performance of the selected model.
The classification algorithms utilized in this study included Decision Trees, k-Nearest Neighbors (k-NN), Support Vector Machines (SVM), Stochastic Gradient Descent (SGD), Logistic Regression, and Artificial Neural Networks (ANNs).
For data analysis, appropriate visualization techniques and machine learning (ML) methods were applied to facilitate interpretation. The results were also visualized to enhance their comprehensibility.
In conclusion, this project presents both theoretical and practical significance, addressing key challenges associated with the analysis of real-world biological data.