diff --git a/ML from Scratch/K Means Clustering/README.md b/ML from Scratch/K Means Clustering/README.md index 3958820..9bac4b3 100644 --- a/ML from Scratch/K Means Clustering/README.md +++ b/ML from Scratch/K Means Clustering/README.md @@ -163,7 +163,7 @@ isOptimal = True for centroid in self.centroids: original_centroid = previous[centroid] curr = self.centroids[centroid] - if np.sum((curr - original_centroid)/original_centroid * 100.0) > self.tolerance: + if np.sum(np.abs(curr - original_centroid)/original_centroid * 100.0) > self.tolerance: isOptimal = False if isOptimal: break @@ -259,3 +259,4 @@ The Elbow method is based on the principle that **“Sum of squares of distances - K means has problems when data contains outliers - As the number of dimensions increases, the difficulty in getting the algorithm to converge increases due to the curse of dimensionality - If there is overlapping between clusters, k-means doesn’t have an intrinsic measure for uncertainty + diff --git a/ML from Scratch/KNN/README.md b/ML from Scratch/KNN/README.md index 74c2def..3914520 100644 --- a/ML from Scratch/KNN/README.md +++ b/ML from Scratch/KNN/README.md @@ -189,10 +189,11 @@ Second Predictions -> iris_setosa #### Cons -- KNN performs well in a limited number of input variables. So, it’s really challenging to estimate the performance of new data as the number of variables increases. Thus, it is called the curse of dimensionality. In the modern scientific era, increasing quantities of data are being produced and collected. How, for target_class in machine learning, too much data can be a bad thing. At a certain level, additional features or dimensions will decrease the precision of a model, because more data has to be generalized. Thus, this is recognized as the “Curse of dimensionality”. +- KNN performs well in a limited number of input variables. So, it’s really challenging to estimate the performance of new data as the number of variables increases. Thus, it is called the curse of dimensionality. In the modern scientific era, increasing quantities of data are being produced and collected. However, for target_class in machine learning, too much data can be a bad thing. At a certain level, additional features or dimensions will decrease the precision of a model, because more data has to be generalized. Thus, this is recognized as the “Curse of dimensionality”. - KNN requires data that is normalized and also the KNN algorithm cannot deal with the missing value problem. - The biggest problem with the KNN from scratch is finding the correct neighbor number. # 💥 ESSENCE OF THE KNN ALGORITHM IN ONE PICTURE! ![132229901-06f86d02-98c2-473a-a6ce-758701bb2bc5](https://user-images.githubusercontent.com/40186859/185749018-64da0bdc-4f22-492a-a2a1-824c48309fbb.jpg) + diff --git a/ML from Scratch/Logistic Regression/README.md b/ML from Scratch/Logistic Regression/README.md index 8b611df..a6f8c8f 100644 --- a/ML from Scratch/Logistic Regression/README.md +++ b/ML from Scratch/Logistic Regression/README.md @@ -165,7 +165,7 @@ So, our equation changes form finding a Max to Min, now we can solve this using Our Optimizer tries to minimize the loss function of our sigmoid, by loss function I mean, it tries to minimize the error made by our model, and eventually finds a Hyper-Plane which has the lowest error. The loss function has the below equation: -$[y*log(y_p) + (i - y)*log(1 - y_p)]$ +$[y*log(y_p) + (1 - y)*log(1 - y_p)]$ where, y = actual class value of a data point
@@ -347,3 +347,4 @@ plt.show() ``` print(classification_report(y, model.predict(X))) ``` +