1️⃣ What is Unsupervised Learning?
Unsupervised Learning is a type of Machine Learning where:
The model learns patterns from unlabeled data.
There is no correct output (Y).
Unlike supervised learning:
| Supervised | Unsupervised | |------------|-------------| | Has labels | No labels | | Predicts output | Finds hidden patterns | | Regression & Classification | Clustering & Dimensionality Reduction |
Example:
You give the model customer purchase data.
It groups customers automatically based on similarity.
2️⃣ Clustering
Clustering means:
Grouping similar data points together.
Example:
🔹 Popular Clustering Algorithms
| Algorithm | Description | |------------|------------| | K-Means | Groups data into K clusters | | Hierarchical Clustering | Builds cluster tree | | DBSCAN | Density-based clustering | | Gaussian Mixture Model (GMM) | Probabilistic clustering |
🔹 2.1 K-Means (Most Popular)
K-Means works like this:
Goal:
Minimize within-cluster variance.
Mathematically:
Minimize: Sum of squared distances from points to their cluster center
🔹 Choosing K (Elbow Method)
We calculate inertia (error) for different K values and look for the “elbow point”.
3️⃣ Hierarchical Clustering
Two types:
| Type | Description | |------|------------| | Agglomerative | Bottom-up (merge clusters) | | Divisive | Top-down (split clusters) |
It creates a dendrogram (tree diagram).
4️⃣ DBSCAN
DBSCAN groups points based on density.
Advantages:
5️⃣ Dimensionality Reduction
Dimensionality reduction means:
Reducing number of features while keeping important information.
Example:
100 features → 2 features
Why?
🔹 Common Dimensionality Reduction Algorithms
| Algorithm | Type | |------------|------| | PCA (Principal Component Analysis) | Linear | | t-SNE | Nonlinear | | UMAP | Nonlinear | | Autoencoders | Neural Network |
6️⃣ PCA (Principal Component Analysis)
PCA:
Intuition:
Find new axes that capture most information.
7️⃣ Applications of Unsupervised Learning
| Application | Example | |------------|----------| | Customer Segmentation | Marketing groups | | Anomaly Detection | Fraud detection | | Image Compression | Reduce size | | Topic Modeling | Document grouping | | Data Visualization | Reduce to 2D | | Recommendation Systems | User similarity |
8️⃣ Evaluation in Unsupervised Learning
Since no labels exist, evaluation is harder.
| Metric | Used For | |--------|----------| | Inertia | K-Means | | Silhouette Score | Clustering quality | | Davies-Bouldin Index | Cluster separation |
Silhouette Score range:
9️⃣ Python Example — Clustering (K-Means)
🔹 Install Libraries
pip install numpy pandas scikit-learn matplotlib seaborn
🔹 K-Means Example
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.metrics import silhouette_score
# Generate synthetic data
X, _ = make_blobs(n_samples=300, centers=4, random_state=42)
# Apply K-Means
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
labels = kmeans.labels_
# Evaluation
print("Silhouette Score:", silhouette_score(X, labels))
# Plot
plt.scatter(X[:,0], X[:,1], c=labels)
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1],
s=200, c='red', marker='X')
plt.title("K-Means Clustering")
plt.show()
🔟 Python Example — PCA
from sklearn.decomposition import PCA
# Reduce to 2 dimensions
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
print("Original Shape:", X.shape)
print("Reduced Shape:", X_reduced.shape)
plt.scatter(X_reduced[:,0], X_reduced[:,1])
plt.title("PCA Reduced Data")
plt.show()
1️⃣1️⃣ What You Should Understand
After this topic you should understand:
FULL COMPILATION OF ALL CODE
# Install:
# pip install numpy pandas scikit-learn matplotlib seaborn
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.metrics import silhouette_score
from sklearn.decomposition import PCA
# Generate synthetic data
X, _ = make_blobs(n_samples=300, centers=4, random_state=42)
# -------------------------
# K-MEANS CLUSTERING
# -------------------------
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
labels = kmeans.labels_
print("Silhouette Score:", silhouette_score(X, labels))
plt.figure()
plt.scatter(X[:,0], X[:,1], c=labels)
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1],
s=200, c='red', marker='X')
plt.title("K-Means Clustering")
plt.show()
# -------------------------
# PCA DIMENSIONALITY REDUCTION
# -------------------------
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
print("Original Shape:", X.shape)
print("Reduced Shape:", X_reduced.shape)
plt.figure()
plt.scatter(X_reduced[:,0], X_reduced[:,1])
plt.title("PCA Reduced Data")
plt.show()