1️⃣ What is Ensemble Learning?
Ensemble Learning means:
Combining multiple models to create a stronger model.
Instead of relying on one model, we combine many.
Think of it like:
2️⃣ Why Ensembles Work
Individual models may:
But combining models:
3️⃣ Types of Ensemble Methods
| Method | Idea | |--------|------| | Bagging | Train models independently in parallel | | Boosting | Train models sequentially, correcting errors | | Stacking | Combine predictions using another model |
4️⃣ Bagging (Bootstrap Aggregating)
Bagging works like this:
Goal:
Reduce variance.
🔹 Why “Bootstrap”?
Because we sample with replacement.
Some data points appear multiple times,
some not at all.
5️⃣ Random Forest (Most Popular Bagging Method)
Random Forest = Bagging + Decision Trees.
Instead of one tree:
Random Forest reduces overfitting compared to a single tree.
6️⃣ Boosting
Boosting works differently.
Instead of independent models:
Models are trained sequentially.
Each new model:
Goal:
Reduce bias.
7️⃣ AdaBoost (Adaptive Boosting)
AdaBoost:
It adapts to mistakes.
8️⃣ Gradient Boosting
Gradient Boosting:
Very powerful.
Popular implementations:
Used heavily in:
9️⃣ Bagging vs Boosting
| Feature | Bagging | Boosting | |----------|----------|----------| | Training | Parallel | Sequential | | Goal | Reduce variance | Reduce bias | | Overfitting | Less prone | Can overfit | | Example | Random Forest | Gradient Boosting |
🔟 Applications of Ensemble Methods
| Application | Example | |-------------|----------| | Fraud Detection | Banking systems | | Medical Diagnosis | Disease prediction | | Recommendation Systems | User ranking | | Credit Scoring | Loan approval | | Competitions | Kaggle winning models |
1️⃣1️⃣ Python Example — Random Forest (Bagging)
🔹 Install Libraries
pip install numpy pandas scikit-learn
🔹 Random Forest Classification
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
# Load data
data = load_iris()
X = data.data
y = data.target
# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Model
rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_train, y_train)
# Predict
predictions = rf.predict(X_test)
print("Random Forest Accuracy:", accuracy_score(y_test, predictions))
1️⃣2️⃣ Python Example — Gradient Boosting
from sklearn.ensemble import GradientBoostingClassifier
gb = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
gb.fit(X_train, y_train)
predictions_gb = gb.predict(X_test)
print("Gradient Boosting Accuracy:", accuracy_score(y_test, predictions_gb))
1️⃣3️⃣ What You Should Understand
After this topic, you should know:
FULL COMPILATION OF ALL CODE
# Install:
# pip install numpy pandas scikit-learn
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
# Load data
data = load_iris()
X = data.data
y = data.target
# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# -------------------------
# RANDOM FOREST
# -------------------------
rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_train, y_train)
rf_predictions = rf.predict(X_test)
print("Random Forest Accuracy:", accuracy_score(y_test, rf_predictions))
# -------------------------
# GRADIENT BOOSTING
# -------------------------
gb = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
gb.fit(X_train, y_train)
gb_predictions = gb.predict(X_test)
print("Gradient Boosting Accuracy:", accuracy_score(y_test, gb_predictions))