Image Recognition

📌 What Is Image Recognition?

Image recognition is the process of teaching a computer to:

Look at an image
Understand what is inside it
Classify or identify objects

Example:

Input → Picture of a cat
Output → “Cat”

Modern image recognition is powered by deep learning, especially Convolutional Neural Networks (CNNs).

How Does Image Recognition Work?

At a high level:


1. Input Image
2. Feature Extraction (edges, shapes, patterns)
3. Classification
4. Output Prediction

Step 1: Image as Numbers

A computer does not see images like humans.

It sees:

A grid of pixels
Each pixel has numbers

Example:

Grayscale image → values from 0 to 255
RGB image → 3 channels (Red, Green, Blue)

Example shape:


Image shape = (Height, Width, Channels)
Example = (224, 224, 3)

Step 2: Feature Extraction (CNN)

This is where Convolutional Neural Networks (CNNs) come in.

CNNs:

Detect edges
Detect textures
Detect shapes
Combine features into objects

Convolution Layer

A small filter (kernel) slides over the image.

Example 3×3 filter:

[ 1  0 -1
  1  0 -1
  1  0 -1 ]

This filter detects vertical edges.

The filter moves across the image and creates a feature map.

Pooling Layer

Reduces size of feature map.

Example:

MaxPooling(2×2)
Takes maximum value in each 2×2 region

Purpose:

Reduce computation
Prevent overfitting
Keep important features

Typical CNN Architecture

Input Image
↓
Convolution
↓
ReLU
↓
Pooling
↓
Fully Connected Layer
↓
Softmax (Output)

Popular CNN Architectures

Below is a markdown-ready table:

| Architecture | Year | Key Idea | Created By |
|--------------|------|----------|------------|
| LeNet-5 | 1998 | First CNN for digits | Yann LeCun |
| AlexNet | 2012 | Deep CNN breakthrough | Alex Krizhevsky |
| VGG16 | 2014 | Very deep simple layers | Oxford |
| ResNet | 2015 | Skip connections | Microsoft |
| EfficientNet | 2019 | Scaled CNN design | Google |

🔥 Breakthrough Moment

The big breakthrough came in 2012 when ImageNet competition was won by AlexNet.

This showed deep learning dramatically outperformed traditional computer vision.

Modern Image Recognition Systems

Used in:

Self-driving cars
Medical imaging
Face recognition
Security systems

Major AI research companies involved:

Google
OpenAI
Microsoft

Example: Simple CNN in PyTorch

import torch
import torch.nn as nn
import torch.nn.functional as F

class ImageClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, 3)  # RGB input
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(16 * 111 * 111, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        return x

model = ImageClassifier()
print(model)

This model:

Takes 3-channel image
Applies convolution
Pools features
Classifies into 10 categories

Training Process

1. Load dataset (images + labels)
2. Forward pass
3. Compute loss
4. Backpropagation
5. Update weights
6. Repeat for many epochs

Loss Function for Image Recognition

For classification:

CrossEntropyLoss

This combines:

Softmax
Log loss

How Large-Scale Systems Work

Large models:

Use millions of images
Use GPUs
Use data augmentation
Use pretrained models

Example pretrained models:

ResNet
EfficientNet
Vision Transformer

CNN vs Vision Transformer

| CNN | Vision Transformer |
|------|--------------------|
| Uses convolution filters | Uses self-attention |
| Strong local feature extraction | Better global understanding |
| Data-efficient | Needs large datasets |

Real-World Applications

Face Unlock (smartphones)
Cancer detection from X-rays
Traffic sign detection
Product recommendation via images

Full Combined Code Example

Below is a minimal training-ready example:

Example Code:

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

# Simple CNN Model
class ImageClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(16 * 111 * 111, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        return x

# Initialize model
model = ImageClassifier()

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Dummy input (batch_size=1, 3 channels, 224x224)
dummy_input = torch.randn(1, 3, 224, 224)
dummy_target = torch.tensor([1])

# Forward
output = model(dummy_input)
loss = criterion(output, dummy_target)

# Backward
optimizer.zero_grad()
loss.backward()
optimizer.step()

print("Training step complete")

Deep Learning

Image Recognition

Example Code:

Deep Learning

All Courses