Implementation of Machine Learning Algorithms from Scratch

Learning Machine Learning from basic to advance and develop Machine Learning Models from Scratch in Python

Navigation

Useful Commands
Installation
Reality vs Expectation
Machine Learning from Beginner to Advanced
Scratch Implementation
Mathematical Implementation
Machine Learning Interview Questions with Answers
Essential Machine Learning Formulas
Pratice Guide for Data Science Learning

Useful Resources

Title	Repository
USEFUL GIT COMMANDS	🔗
ML TOOL	🔗

Installation

Title	Repository
INSTALL THE ANACONDA PYTHON ON WINDOWS AND LINUX	🔗

Reality vs Expectation

Title	Repository
IS AI OVERHYPED? REALITY VS EXPECTATION	🔗

Machine Learning from Beginner to Advanced

Title	Repository
HISTORY OF MATHEMATICS, AI & ML - HISTORY & MOTIVATION	🔗
INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING	🔗
KEY TERMS USED IN MACHINE LEARNING	🔗
PERFORMANCE METRICS IN MACHINE LEARNING CLASSIFICATION MODEL	🔗
PERFORMANCE METRICS IN MACHINE LEARNING REGRESSION MODEL	🔗

Scratch Implementation

Title	Repository
LINEAR REGRESSION FROM SCRATCH	🔗
LOGISTIC REGRESSION FROM SCRATCH	🔗
NAIVE BAYES FROM SCRATCH	🔗
DECISION TREE FROM SCRATCH	🔗
RANDOM FOREST FROM SCRATCH	🔗
K NEAREST NEIGHBOUR	🔗
K MEANS CLUSTERING	🔗

Mathematical Implementation

Title	Repository
CONFUSION MATRIX FOR YOUR MULTI-CLASS ML MODEL	🔗

Essential Machine Learning Formulas

Title	Repository
MOSTLY USED MACHINE LEARNING FORMULAS	🔗

Pratice Guide for Data Science Learning

Title	Repository
Research Guide for FYP	🔗
The Intermediate Guide to 180 Days Data Science Learning Plan	🔗

Algorithm Pros and Cons

KN Neighbors
✔ Simple, No training, No assumption about data, Easy to implement, New data can be added seamlessly, Only one hyperparameter
✖ Doesn't work well in high dimensions, Sensitive to noisy data, missing values and outliers, Doesn't work well with large data sets — cost of calculating distance is high, Needs feature scaling, Doesn't work well on imbalanced data, Doesn't deal well with missing values
Decision Tree
✔ Doesn't require standardization or normalization, Easy to implement, Can handle missing values, Automatic feature selection
✖ High variance, Higher training time, Can become complex, Can easily overfit
Random Forest
✔ Left-out data can be used for testing, High accuracy, Provides feature importance estimates, Can handle missing values, Doesn't require feature scaling, Good performance on imbalanced datasets, Can handle large dataset, Outliers have little impact, Less overfitting
✖ Less interpretable, More computational resources, Prediction time high
Linear Regression
✔ Simple, Interpretable, Easy to Implement
✖ Assumes linear relationship between features, Sensitive to outliers
Logistic Regression
✔ Doesn’t assume linear relationship between independent and dependent variables, Output can be interpreted as probability, Robust to noise
✖ Requires more data, Effective when linearly separable
Lasso Regression (L1)
✔ Prevents overfitting, Selects features by shrinking coefficients to zero
✖ Selected features will be biased, Prediction can be worse than Ridge
Ridge Regression (L2)
✔ Prevents overfitting
✖ Increases bias, Less interpretability
AdaBoost
✔ Fast, Reduced bias, Little need to tune
✖ Vulnerable to noise, Can overfit
Gradient Boosting
✔ Good performance
✖ Harder to tune hyperparameters
XGBoost
✔ Less feature engineering required, Outliers have little impact, Can output feature importance, Handles large datasets, Good model performance, Less prone to overfitting \ ✖ Difficult to interpret, Harder to tune as there are numerous hyperparameters
SVM
✔ Performs well in higher dimensions, Excellent when classes are separable, Outliers have less impact
✖ Slow, Poor performance with overlapping classes, Selecting appropriate kernel functions can be tricky
Naïve Bayes
✔ Fast, Simple, Requires less training data, Scalable, Insensitive to irrelevant features, Good performance with high-dimensional data
✖ Assumes independence of features
Deep Learning
✔ Superb performance with unstructured data (images, video, audio, text)
✖ (Very) long training time, Many hyperparameters, Prone to overfitting

AI/ML dataset

Source	Link
Google Dataset Search – A search engine for datasets:	🔗
IBM’s collection of datasets for enterprise applications	🔗
Kaggle Datasets	🔗
Huggingface Datasets – A Python library for loading NLP datasets	🔗
A large list organized by application domain	🔗
Computer Vision Datasets (a really large list)	🔗
Datasetlist – Datasets by domain	🔗
OpenML – A search engine for curated datasets and workflows	🔗
Papers with Code – Datasets with benchmarks	🔗
Penn Machine Learning Benchmarks	🔗
VisualDataDiscovery (for Computer Vision)	🔗
UCI Machine Learning Repository	🔗
Roboflow Public Datasets for computer vision	🔗

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
Git_Commands_CheatSheet		Git_Commands_CheatSheet
Installation		Installation
ML from Scratch		ML from Scratch
Machine Learning from Beginner to Advanced		Machine Learning from Beginner to Advanced
Machine_Learning_Formulae		Machine_Learning_Formulae
Machine_Learning_Tools		Machine_Learning_Tools
Mathematical Implementation		Mathematical Implementation
Pratice Guide		Pratice Guide
Reality vs Expectation		Reality vs Expectation
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation of Machine Learning Algorithms from Scratch

Navigation

Useful Resources

Installation

Reality vs Expectation

Machine Learning from Beginner to Advanced

Scratch Implementation

Mathematical Implementation

Essential Machine Learning Formulas

Pratice Guide for Data Science Learning

Algorithm Pros and Cons

AI/ML dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Implementation of Machine Learning Algorithms from Scratch

Navigation

Useful Resources

Installation

Reality vs Expectation

Machine Learning from Beginner to Advanced

Scratch Implementation

Mathematical Implementation

Essential Machine Learning Formulas

Pratice Guide for Data Science Learning

Algorithm Pros and Cons

AI/ML dataset

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages