Dimensional modeling is a way of designing databases specifically for analytics and reporting.
It is mainly used in:
Instead of organizing data for transactions (like banking apps), dimensional modeling organizes data to answer questions like:
1️⃣ Why Do We Need Dimensional Modeling?
There are two main types of databases:
TypePurposeTransactional (OLTP)Record daily operations (orders, payments, updates)Analytical (OLAP)Analyze data and generate reports
Transactional databases are usually highly normalized (many small tables).
But analytics needs:
Dimensional modeling solves this.
2️⃣ The Core Idea: Facts and Dimensions
Dimensional modeling organizes data into two main types of tables:
Table TypeWhat It StoresFact TableMeasurable events (numbers)Dimension TableDescriptive context (text, categories, details)
Think of it like this:
Fact = “Something happened”
Dimensions = “Details about what happened”
3️⃣ Fact Tables (The Numbers)
A fact table stores measurable values.
Example: Sales Fact Table
| sale_id | date_key | product_key | customer_key | quantity | total_amount | |---------|----------|------------|--------------|----------|-------------| | 1 | 20240101 | 101 | 501 | 2 | 200 | | 2 | 20240102 | 102 | 502 | 1 | 150 |
Notice:
Each row represents one sales event.
This is called defining the grain.
4️⃣ What is “Grain”?
Grain defines:
“What does one row in the fact table represent?”
Examples of grain:
| Example Business | Grain | |------------------|-------| | Sales system | One row per sale | | Banking system | One row per transaction | | Inventory system | One row per product per day |
Choosing the correct grain is extremely important.
5️⃣ Dimension Tables (The Context)
Dimension tables describe the facts.
Example: Product Dimension
| product_key | product_name | category | brand | |------------|--------------|--------------|--------| | 101 | Laptop | Electronics | Dell | | 102 | Phone | Electronics | Apple |
Example: Date Dimension
| date_key | full_date | month | year | quarter | |----------|------------|-------|------|---------| | 20240101 | 2024-01-01 | Jan | 2024 | Q1 | | 20240102 | 2024-01-02 | Jan | 2024 | Q1 |
Dimensions answer:
6️⃣ Star Schema (Most Common Structure)
The most common dimensional model is the ⭐ Star Schema.
Structure:
Date Dimension
|
Customer --- Sales Fact --- Product
|
Store Dimension
It’s called a star because:
Advantages:
7️⃣ Snowflake Schema
A Snowflake Schema is a more normalized version of star schema.
Example:
Instead of:
Product → Category
We split it into:
Product → Category → Department
Example:
Product Dimension | product_key | product_name | category_key | |------------|--------------|--------------| Category Dimension | category_key | category_name | department_key | Department Dimension | department_key | department_name |
Snowflake saves space but makes queries more complex.
8️⃣ Types of Facts
Facts can behave differently when aggregated.
| Type | Meaning | Example | |------|----------|----------| | Additive | Can sum across all dimensions | Sales amount | | Semi-additive | Cannot sum across time | Account balance | | Non-additive | Cannot sum at all | Percentage |
Example:
9️⃣ Simple SQL Example
Let’s say we want:
“Total sales by product category”
SQL:
SELECT
p.category,
SUM(f.total_amount) AS total_sales
FROM sales_fact f
JOIN product_dimension p
ON f.product_key = p.product_key
GROUP BY p.category;
This is simple because of dimensional modeling.
🔟 Step-by-Step Design Process
Here is the standard workflow:
1. Choose the business process (Sales, Inventory, Orders) 2. Define the grain (what one row means) 3. Identify dimensions (who, what, when, where) 4. Identify facts (numeric measures) 5. Design star schema 6. Build ETL process 7. Load into data warehouse
1️⃣1️⃣ Slowly Changing Dimensions (Basic Idea)
Sometimes dimension data changes.
Example:
Customer moves to another city.
We may want to:
Common types:
| Type | Behavior | |------|----------| | Type 1 | Overwrite old value | | Type 2 | Add new row (keep history) | | Type 3 | Add new column for previous value |
1️⃣2️⃣ Where Dimensional Modeling is Used
Dimensional modeling is used inside modern data warehouses such as:
These systems are optimized for large analytical queries.
1️⃣3️⃣ Complete Simple Example (SQL Schema Creation)
Below is a simple implementation.
-- Create Product Dimension
CREATE TABLE product_dimension (
product_key INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
brand VARCHAR(50)
);
-- Create Date Dimension
CREATE TABLE date_dimension (
date_key INT PRIMARY KEY,
full_date DATE,
month VARCHAR(10),
year INT,
quarter VARCHAR(5)
);
-- Create Sales Fact Table
CREATE TABLE sales_fact (
sale_id INT PRIMARY KEY,
date_key INT,
product_key INT,
quantity INT,
total_amount DECIMAL(10,2),
FOREIGN KEY (date_key) REFERENCES date_dimension(date_key),
FOREIGN KEY (product_key) REFERENCES product_dimension(product_key)
);
1️⃣4️⃣ Full Code Compilation (All SQL Together)
-- Create Product Dimension
CREATE TABLE product_dimension (
product_key INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
brand VARCHAR(50)
);
-- Create Date Dimension
CREATE TABLE date_dimension (
date_key INT PRIMARY KEY,
full_date DATE,
month VARCHAR(10),
year INT,
quarter VARCHAR(5)
);
-- Create Sales Fact Table
CREATE TABLE sales_fact (
sale_id INT PRIMARY KEY,
date_key INT,
product_key INT,
quantity INT,
total_amount DECIMAL(10,2),
FOREIGN KEY (date_key) REFERENCES date_dimension(date_key),
FOREIGN KEY (product_key) REFERENCES product_dimension(product_key)
);
-- Example Query
SELECT
p.category,
SUM(f.total_amount) AS total_sales
FROM sales_fact f
JOIN product_dimension p
ON f.product_key = p.product_key
GROUP BY p.category;