Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion ML from Scratch/K Means Clustering/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ isOptimal = True
for centroid in self.centroids:
original_centroid = previous[centroid]
curr = self.centroids[centroid]
if np.sum((curr - original_centroid)/original_centroid * 100.0) > self.tolerance:
if np.sum(np.abs(curr - original_centroid)/original_centroid * 100.0) > self.tolerance:
isOptimal = False
if isOptimal:
break
Expand Down Expand Up @@ -259,3 +259,4 @@ The Elbow method is based on the principle that **“Sum of squares of distances
- K means has problems when data contains outliers
- As the number of dimensions increases, the difficulty in getting the algorithm to converge increases due to the curse of dimensionality
- If there is overlapping between clusters, k-means doesn’t have an intrinsic measure for uncertainty

3 changes: 2 additions & 1 deletion ML from Scratch/KNN/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,10 +189,11 @@ Second Predictions -> iris_setosa

#### Cons

- KNN performs well in a limited number of input variables. So, it’s really challenging to estimate the performance of new data as the number of variables increases. Thus, it is called the curse of dimensionality. In the modern scientific era, increasing quantities of data are being produced and collected. How, for target_class in machine learning, too much data can be a bad thing. At a certain level, additional features or dimensions will decrease the precision of a model, because more data has to be generalized. Thus, this is recognized as the “Curse of dimensionality”.
- KNN performs well in a limited number of input variables. So, it’s really challenging to estimate the performance of new data as the number of variables increases. Thus, it is called the curse of dimensionality. In the modern scientific era, increasing quantities of data are being produced and collected. However, for target_class in machine learning, too much data can be a bad thing. At a certain level, additional features or dimensions will decrease the precision of a model, because more data has to be generalized. Thus, this is recognized as the “Curse of dimensionality”.
- KNN requires data that is normalized and also the KNN algorithm cannot deal with the missing value problem.
- The biggest problem with the KNN from scratch is finding the correct neighbor number.

# 💥 ESSENCE OF THE KNN ALGORITHM IN ONE PICTURE!

![132229901-06f86d02-98c2-473a-a6ce-758701bb2bc5](https://user-images.githubusercontent.com/40186859/185749018-64da0bdc-4f22-492a-a2a1-824c48309fbb.jpg)

3 changes: 2 additions & 1 deletion ML from Scratch/Logistic Regression/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ So, our equation changes form finding a Max to Min, now we can solve this using

Our Optimizer tries to minimize the loss function of our sigmoid, by loss function I mean, it tries to minimize the error made by our model, and eventually finds a Hyper-Plane which has the lowest error. The loss function has the below equation:

$[y*log(y_p) + (i - y)*log(1 - y_p)]$
$[y*log(y_p) + (1 - y)*log(1 - y_p)]$

where,
y = actual class value of a data point <br>
Expand Down Expand Up @@ -347,3 +347,4 @@ plt.show()
```
print(classification_report(y, model.predict(X)))
```