Unsupervised learning (clustering, dimensionality reduction): algorithms, applications | Machine Learning Tutorial - Learn with VOKS
Back Next

Unsupervised learning (clustering, dimensionality reduction): algorithms, applications


1️⃣ What is Unsupervised Learning?

Unsupervised Learning is a type of Machine Learning where:

The model learns patterns from unlabeled data.

There is no correct output (Y).

Unlike supervised learning:


| Supervised | Unsupervised |
|------------|-------------|
| Has labels | No labels |
| Predicts output | Finds hidden patterns |
| Regression & Classification | Clustering & Dimensionality Reduction |

Example:

You give the model customer purchase data.

It groups customers automatically based on similarity.

2️⃣ Clustering

Clustering means:

Grouping similar data points together.

Example:

  • Group similar customers
  • Group similar news articles
  • Group similar images

🔹 Popular Clustering Algorithms


| Algorithm | Description |
|------------|------------|
| K-Means | Groups data into K clusters |
| Hierarchical Clustering | Builds cluster tree |
| DBSCAN | Density-based clustering |
| Gaussian Mixture Model (GMM) | Probabilistic clustering |

🔹 2.1 K-Means (Most Popular)

K-Means works like this:

  1. Choose number of clusters (K)
  2. Randomly initialize centroids
  3. Assign points to nearest centroid
  4. Update centroids
  5. Repeat until convergence

Goal:

Minimize within-cluster variance.

Mathematically:


Minimize: Sum of squared distances from points to their cluster center

🔹 Choosing K (Elbow Method)

We calculate inertia (error) for different K values and look for the “elbow point”.


3️⃣ Hierarchical Clustering

Two types:


| Type | Description |
|------|------------|
| Agglomerative | Bottom-up (merge clusters) |
| Divisive | Top-down (split clusters) |

It creates a dendrogram (tree diagram).


4️⃣ DBSCAN

DBSCAN groups points based on density.

Advantages:

  • No need to choose K
  • Handles noise
  • Finds irregular-shaped clusters

5️⃣ Dimensionality Reduction

Dimensionality reduction means:

Reducing number of features while keeping important information.

Example:

100 features → 2 features

Why?

  • Faster computation
  • Remove noise
  • Better visualization
  • Avoid curse of dimensionality

🔹 Common Dimensionality Reduction Algorithms


| Algorithm | Type |
|------------|------|
| PCA (Principal Component Analysis) | Linear |
| t-SNE | Nonlinear |
| UMAP | Nonlinear |
| Autoencoders | Neural Network |

6️⃣ PCA (Principal Component Analysis)

PCA:

  • Finds directions of maximum variance
  • Projects data into lower dimension
  • Uses eigenvectors and covariance matrix

Intuition:

Find new axes that capture most information.

7️⃣ Applications of Unsupervised Learning


| Application | Example |
|------------|----------|
| Customer Segmentation | Marketing groups |
| Anomaly Detection | Fraud detection |
| Image Compression | Reduce size |
| Topic Modeling | Document grouping |
| Data Visualization | Reduce to 2D |
| Recommendation Systems | User similarity |

8️⃣ Evaluation in Unsupervised Learning

Since no labels exist, evaluation is harder.


| Metric | Used For |
|--------|----------|
| Inertia | K-Means |
| Silhouette Score | Clustering quality |
| Davies-Bouldin Index | Cluster separation |

Silhouette Score range:

  • -1 to 1
  • Closer to 1 → better clusters

9️⃣ Python Example — Clustering (K-Means)

🔹 Install Libraries


pip install numpy pandas scikit-learn matplotlib seaborn

🔹 K-Means Example


import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.metrics import silhouette_score

# Generate synthetic data
X, _ = make_blobs(n_samples=300, centers=4, random_state=42)

# Apply K-Means
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
labels = kmeans.labels_

# Evaluation
print("Silhouette Score:", silhouette_score(X, labels))

# Plot
plt.scatter(X[:,0], X[:,1], c=labels)
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1],
            s=200, c='red', marker='X')
plt.title("K-Means Clustering")
plt.show()

🔟 Python Example — PCA


from sklearn.decomposition import PCA

# Reduce to 2 dimensions
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)

print("Original Shape:", X.shape)
print("Reduced Shape:", X_reduced.shape)

plt.scatter(X_reduced[:,0], X_reduced[:,1])
plt.title("PCA Reduced Data")
plt.show()

1️⃣1️⃣ What You Should Understand

After this topic you should understand:

  • Difference between supervised & unsupervised learning
  • What clustering is
  • How K-Means works
  • What dimensionality reduction is
  • How PCA works
  • Applications in real world

FULL COMPILATION OF ALL CODE


Example Code:
# Install:
# pip install numpy pandas scikit-learn matplotlib seaborn

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.metrics import silhouette_score
from sklearn.decomposition import PCA

# Generate synthetic data
X, _ = make_blobs(n_samples=300, centers=4, random_state=42)

# -------------------------
# K-MEANS CLUSTERING
# -------------------------
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
labels = kmeans.labels_

print("Silhouette Score:", silhouette_score(X, labels))

plt.figure()
plt.scatter(X[:,0], X[:,1], c=labels)
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1],
            s=200, c='red', marker='X')
plt.title("K-Means Clustering")
plt.show()

# -------------------------
# PCA DIMENSIONALITY REDUCTION
# -------------------------
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)

print("Original Shape:", X.shape)
print("Reduced Shape:", X_reduced.shape)

plt.figure()
plt.scatter(X_reduced[:,0], X_reduced[:,1])
plt.title("PCA Reduced Data")
plt.show()
Machine Learning
Supervised Learning (Regression & Classification) Unsupervised learning (clustering, dimensionality reduction): algorithms, applications Ensemble methods and advanced topics (boosting, bagging)
All Courses
Advance AI Bootstrap C C++ Computer Vision Content Writing CSS Cyber Security Data Analysis Deep Learning Email Marketing Excel Figma HTML Java Script Machine Learning MySQLi Node JS PHP Power Bi Python Python for AI Python for Analysis React React Native SEO SMM SQL