Data Analysis is the process of collecting, cleaning, organizing, and examining data to discover useful information, draw conclusions, and support decision-making.
In simple terms:
Data analysis helps you turn raw numbers into meaningful insights.
For example:
The Data Analysis Process (Step-by-Step)
Data analysis usually follows these steps:
1️⃣ Define the Question
Before touching data, ask:
Example:
“Which product generates the highest revenue?”
2️⃣ Collect the Data
Data can come from:
3️⃣ Clean the Data
Real-world data is messy. You may find:
Cleaning ensures the data is reliable.
4️⃣ Explore the Data (Exploratory Data Analysis - EDA)
This step helps you understand:
You use:
5️⃣ Analyze the Data
Apply methods like:
6️⃣ Communicate Results
Data analysis is useless if people can’t understand it.
You present:
Types of Data Analysis
There are four main types:
1. Descriptive Analysis
What happened?
Example:
2. Diagnostic Analysis
Why did it happen?
Example:
3. Predictive Analysis
What will happen?
Example:
4. Prescriptive Analysis
What should we do?
Example:
Basic Concepts You Should Know
Variables
A variable is a feature or characteristic in your dataset.
Example dataset:
Types of Data
1. Numerical Data
2. Categorical Data
Mean, Median, Mode
Tools Used in Data Analysis
Common tools:
In this guide, we’ll use Python, since it is beginner-friendly and powerful.
Introduction to Data Analysis with Python
We’ll use two important libraries:
Step 1: Install Libraries (if needed)
pip install pandas matplotlib
Step 2: Import Libraries
import pandas as pd import matplotlib.pyplot as plt
Step 3: Create a Simple Dataset
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"Age": [23, 25, 22, 24, 23],
"Score": [85, 90, 78, 88, 92]
}
df = pd.DataFrame(data)
print(df)
Step 4: Basic Exploration
View first rows
print(df.head())
Get summary statistics
print(df.describe())
Step 5: Calculate Mean Score
mean_score = df["Score"].mean()
print("Average Score:", mean_score)
Step 6: Filter Data
Find students with score above 85:
high_scores = df[df["Score"] > 85] print(high_scores)
Step 7: Create a Simple Chart
plt.bar(df["Name"], df["Score"])
plt.title("Student Scores")
plt.xlabel("Name")
plt.ylabel("Score")
plt.show()
Real-World Example Scenario
Imagine you own a small shop.
You collect:
With data analysis, you can:
Why Data Analysis is Important
Data analysis helps:
Today, almost every industry uses data analysis.
Skills Required to Become a Data Analyst
# Install libraries first if needed:
# pip install pandas matplotlib
import pandas as pd
import matplotlib.pyplot as plt
# Create dataset
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"Age": [23, 25, 22, 24, 23],
"Score": [85, 90, 78, 88, 92]
}
# Create DataFrame
df = pd.DataFrame(data)
# Display dataset
print("Full Dataset:")
print(df)
# Show first rows
print("\nFirst 5 Rows:")
print(df.head())
# Summary statistics
print("\nSummary Statistics:")
print(df.describe())
# Calculate mean score
mean_score = df["Score"].mean()
print("\nAverage Score:", mean_score)
# Filter students with score > 85
high_scores = df[df["Score"] > 85]
print("\nStudents with Score > 85:")
print(high_scores)
# Create bar chart
plt.bar(df["Name"], df["Score"])
plt.title("Student Scores")
plt.xlabel("Name")
plt.ylabel("Score")
plt.show()