Unsupervised Learning

What is unsupervised Learning?

Unsupervised learning is where you only have input data (X) and no corresponding output variables.

What is clustering?

It's an machine learning technique which segregate the various data points into different groups called clusters such that entities in a particular group comparatively have more similar traits than entities in another group.

Top 5 Clustering Algorithms to know

K-Means Clustering
Agglomerative Hierarchical Clustering
Mean-Shift Clustering
Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM)

K- means Clustering ?

k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.

K-Means Clustering Algorithm

Pros

K-Means has the advantage that it’s pretty fast, as all we’re really doing is computing the distances between points and group centers; very few computations! It thus has a linear complexity O(n).

Cons

You have to select how many groups/classes there are.
K-means also starts with a random choice of cluster centers and therefore it may yield different clustering results on different runs of the algorithm. Thus, the results may not be repeatable and lack consistency.

Determining The Optimal Number Of Clusters: 3 Must Know Methods?

Elbow method
Average silhouette method
Gap statistic method

Hierarchical Clustering

It is a type of connectivity model clustering which is based on the fact that data points that are closer to each other are more similar than the data points lying far away in a data space.

As the name speaks for itself, the hierarchical clustering forms the hierarchy of the clusters that can be studied by visualising dendogram.

Pros

Hierarchical clustering does not require us to specify the number of clusters and we can even select which number of clusters looks best since we are building a tree.

Cons

Lower efficiency, as it has a time complexity of O(n³)

Principal Component Analysis

Principal Component Analysis (PCA) is a dimension reduction technique that projects the data into a lower dimensional space
PCA uses Singular Value Decomposition (SVD), which is a matrix factorization method that decomposes a matrix into three smaller matrices (more details of SVD here)
PCA finds top N principal components, which are dimensions along which the data vary (spread out) the most. Intuitively, the more spread out the data along a specific dimension, the more information is contained, thus the more important this dimension is for the pattern recognition of the dataset
PCA can be used as pre-step for data visualization: reducing high dimensional data into 2D or 3D. An alternative dimensionality reduction technique is t-SNE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsupervised Learning

What is unsupervised Learning?

What is clustering?

Top 5 Clustering Algorithms to know

K- means Clustering ?

K-Means Clustering Algorithm

Pros

Cons

Determining The Optimal Number Of Clusters: 3 Must Know Methods?

Hierarchical Clustering

Pros

Cons

Principal Component Analysis

FilesExpand file tree

Unsupervised Learning.md

Latest commit

History

Unsupervised Learning.md

File metadata and controls

Unsupervised Learning

What is unsupervised Learning?

What is clustering?

Top 5 Clustering Algorithms to know

K- means Clustering ?

K-Means Clustering Algorithm

Pros

Cons

Determining The Optimal Number Of Clusters: 3 Must Know Methods?

Hierarchical Clustering

Pros

Cons

Principal Component Analysis