Load and Transforming Data | Power Bi Tutorial - Learn with VOKS
Back Next

Load and Transforming Data


Loading data → getting the data into your analysis tool
Transforming data → cleaning and reshaping the data so it’s useful

Think of it like cooking:

  • Loading data = buying groceries
  • Transforming data = washing, cutting, and preparing ingredients
  • Analysis = cooking the meal

Let’s break everything down clearly.


1️⃣ What Does “Loading Data” Mean?

Loading data means importing data from somewhere into a program so you can work with it.

Data can come from:

  • CSV files
  • Excel files
  • Databases
  • APIs
  • JSON files
  • Web scraping

In beginner data analysis, most people start with CSV files.


Example Dataset (students.csv)

name,age,grade,city
Alice,23,88,New York
Bob,25,92,Los Angeles
Charlie,22,85,Chicago
David,23,90,New York

2️⃣ Loading Data in Python (Using Pandas)

In Python, the most common library for data analysis is:

pandas

It allows you to work with data in table format called a DataFrame.


Step 1: Install pandas (if needed)

pip install pandas

Step 2: Load the CSV File

import pandas as pd

# Load CSV file
df = pd.read_csv("students.csv")

# Display first 5 rows
print(df.head())

What is a DataFrame?

A DataFrame is a table with rows and columns.

Here’s what the data looks like:

| name    | age | grade | city        |
|---------|-----|-------|------------|
| Alice   | 23  | 88    | New York   |
| Bob     | 25  | 92    | Los Angeles|
| Charlie | 22  | 85    | Chicago    |
| David   | 23  | 90    | New York   |

3️⃣ What Does “Transforming Data” Mean?

Raw data is usually messy.

Transformation means:

  • Cleaning data
  • Fixing errors
  • Changing formats
  • Creating new columns
  • Filtering rows
  • Grouping data
  • Aggregating (summing, averaging, etc.)

4️⃣ Common Data Transformations (With Examples)

A) Selecting Columns

df["name"]

Selecting multiple columns:

df[["name", "grade"]]

B) Filtering Rows

Example: Students older than 23

df[df["age"] > 23]

C) Creating a New Column

Example: Add “Pass” column (grade ≥ 90)

df["pass"] = df["grade"] >= 90

Result:

| name    | age | grade | city        | pass  |
|---------|-----|-------|------------|-------|
| Alice   | 23  | 88    | New York   | False |
| Bob     | 25  | 92    | Los Angeles| True  |
| Charlie | 22  | 85    | Chicago    | False |
| David   | 23  | 90    | New York   | True  |

D) Handling Missing Data

If data has missing values:

df.isnull().sum()

Remove missing rows:

df = df.dropna()

Fill missing values:

df["grade"] = df["grade"].fillna(df["grade"].mean())

E) Grouping and Aggregating

Example: Average grade per city

df.groupby("city")["grade"].mean()

Result:

| city         | average_grade |
|-------------|--------------|
| Chicago     | 85           |
| Los Angeles | 92           |
| New York    | 89           |

F) Sorting Data

Sort by grade descending:

df.sort_values(by="grade", ascending=False)

G) Renaming Columns

df.rename(columns={"grade": "final_grade"}, inplace=True)

5️⃣ Why Data Transformation is Important

Real-world data is:

  • Inconsistent
  • Incomplete
  • Messy
  • Duplicated
  • Wrong format

If you skip transformation:

  • Your analysis will be wrong
  • Your conclusions will be misleading
  • Your models will fail

Good analysts spend 70–80% of their time cleaning data.


6️⃣ Real-World Workflow

Here’s how loading and transforming usually works:


1. Import libraries
2. Load data
3. Inspect data (head, info, shape)
4. Clean missing values
5. Fix data types
6. Remove duplicates
7. Create new useful columns
8. Filter relevant data
9. Aggregate/group if needed
10. Analyze or visualize

7️⃣ Full Example Workflow (Step-by-Step Code)


import pandas as pd

# 1. Load data
df = pd.read_csv("students.csv")

# 2. Inspect data
print(df.head())
print(df.info())

# 3. Remove duplicates
df = df.drop_duplicates()

# 4. Handle missing values
df["grade"] = df["grade"].fillna(df["grade"].mean())

# 5. Create pass column
df["pass"] = df["grade"] >= 90

# 6. Filter students older than 22
filtered_df = df[df["age"] > 22]

# 7. Group by city and calculate average grade
city_avg = df.groupby("city")["grade"].mean()

# 8. Sort by grade
sorted_df = df.sort_values(by="grade", ascending=False)

print(city_avg)
print(sorted_df)

8️⃣ Summary (Simple Explanation)

Load and Transforming Data means:

StepMeaningLoadingImporting data into your programCleaningFixing errors and missing valuesTransformingChanging data structure or formatFilteringSelecting needed dataAggregatingSummarizing dataSortingOrganizing data

Complete Compilation of All Code Used

Here is everything combined into one script:

Example Code:
import pandas as pd

# Load CSV file
df = pd.read_csv("students.csv")

# Display first rows
print(df.head())

# Check info
print(df.info())

# Check missing values
print(df.isnull().sum())

# Remove duplicates
df = df.drop_duplicates()

# Fill missing grades with average
df["grade"] = df["grade"].fillna(df["grade"].mean())

# Select columns
names = df["name"]
name_grade = df[["name", "grade"]]

# Filter rows
older_students = df[df["age"] > 23]

# Create new column
df["pass"] = df["grade"] >= 90

# Group and aggregate
city_avg = df.groupby("city")["grade"].mean()

# Sort values
sorted_df = df.sort_values(by="grade", ascending=False)

# Rename column
df.rename(columns={"grade": "final_grade"}, inplace=True)

print(city_avg)
print(sorted_df)
Power Bi
Introduction to Power BI Core Features of Power BI Loading and Opening Existing Reports Communicating Key Metrics with Cards Interactivity and Detail — Slicers and Tables Slicers Cleaning Data Power query editor; renaming and re ordering of columns, finding anomalies Field Aggregation & Data Manipulation Transforming & Formatting Columns Formatting Currency Making maps with geographic data Visualization options; dashboards or reports, tables and scatter charts, bubble charts, KPIs, guage Conditional formatting Sorting, Removing Duplicates, and Plotting in Pandas Dax in power bi, context Dax formulas, date data bars, histogram and pie charts Load and Transforming Data Dimensional modeling Facts and dimensional table modeling Breaking tables into multiple tables Finding relationships between tables
All Courses
Advance AI Bootstrap C C++ Computer Vision Content Writing CSS Cyber Security Data Analysis Deep Learning Email Marketing Excel Figma HTML Java Script Machine Learning MySQLi Node JS PHP Power Bi Python Python for AI Python for Analysis React React Native SEO SMM SQL