Data Analysis
Next

Introduction


Data Analysis is the process of collecting, cleaning, organizing, and examining data to discover useful information, draw conclusions, and support decision-making.

In simple terms:

Data analysis helps you turn raw numbers into meaningful insights.

For example:

  • A business analyzes sales data to see which products sell the most.
  • A school analyzes student scores to identify subjects that need improvement.
  • A hospital analyzes patient data to improve treatment outcomes.

The Data Analysis Process (Step-by-Step)

Data analysis usually follows these steps:

1️⃣ Define the Question

Before touching data, ask:

  • What problem are we solving?
  • What do we want to learn?

Example:

“Which product generates the highest revenue?”

2️⃣ Collect the Data

Data can come from:

  • Databases
  • Excel files
  • Surveys
  • APIs
  • Sensors
  • Websites

3️⃣ Clean the Data

Real-world data is messy. You may find:

  • Missing values
  • Duplicate records
  • Incorrect formats
  • Outliers

Cleaning ensures the data is reliable.

4️⃣ Explore the Data (Exploratory Data Analysis - EDA)

This step helps you understand:

  • Patterns
  • Trends
  • Relationships
  • Distributions

You use:

  • Summary statistics
  • Charts and graphs
  • Correlation analysis

5️⃣ Analyze the Data

Apply methods like:

  • Statistical analysis
  • Aggregation
  • Grouping
  • Predictive modeling

6️⃣ Communicate Results

Data analysis is useless if people can’t understand it.

You present:

  • Reports
  • Dashboards
  • Visualizations
  • Recommendations

Types of Data Analysis

There are four main types:

1. Descriptive Analysis

What happened?

Example:

  • Total sales last month.
  • Average test score.

2. Diagnostic Analysis

Why did it happen?

Example:

  • Sales dropped because website traffic decreased.

3. Predictive Analysis

What will happen?

Example:

  • Predict next month’s sales using historical data.

4. Prescriptive Analysis

What should we do?

Example:

  • Increase marketing spending in high-performing regions.

Basic Concepts You Should Know

Variables

A variable is a feature or characteristic in your dataset.

Example dataset:

  • Name → Categorical variable
  • Age → Numerical variable
  • Score → Numerical variable

Types of Data

1. Numerical Data

  • Continuous (height, weight)
  • Discrete (number of students)

2. Categorical Data

  • Gender
  • Country
  • Product category

Mean, Median, Mode

  • Mean → Average value
  • Median → Middle value
  • Mode → Most frequent value

Tools Used in Data Analysis

Common tools:

  • Excel
  • SQL
  • Python
  • R
  • Power BI
  • Tableau

In this guide, we’ll use Python, since it is beginner-friendly and powerful.

Introduction to Data Analysis with Python

We’ll use two important libraries:

  • Pandas → for handling data
  • Matplotlib → for visualization

Step 1: Install Libraries (if needed)

pip install pandas matplotlib

Step 2: Import Libraries

import pandas as pd
import matplotlib.pyplot as plt

Step 3: Create a Simple Dataset

data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [23, 25, 22, 24, 23],
    "Score": [85, 90, 78, 88, 92]
}

df = pd.DataFrame(data)
print(df)

Step 4: Basic Exploration

View first rows

print(df.head())

Get summary statistics

print(df.describe())

Step 5: Calculate Mean Score

mean_score = df["Score"].mean()
print("Average Score:", mean_score)

Step 6: Filter Data

Find students with score above 85:

high_scores = df[df["Score"] > 85]
print(high_scores)

Step 7: Create a Simple Chart

plt.bar(df["Name"], df["Score"])
plt.title("Student Scores")
plt.xlabel("Name")
plt.ylabel("Score")
plt.show()

Real-World Example Scenario

Imagine you own a small shop.

You collect:

  • Date
  • Product
  • Sales amount

With data analysis, you can:

  • Identify best-selling products
  • Find low-sales days
  • Predict busy seasons
  • Optimize inventory

Why Data Analysis is Important

Data analysis helps:

  • Businesses make better decisions
  • Reduce risks
  • Increase profits
  • Improve customer satisfaction
  • Support scientific discoveries

Today, almost every industry uses data analysis.


Skills Required to Become a Data Analyst

  • Basic mathematics & statistics
  • Excel or spreadsheets
  • SQL
  • Python or R
  • Data visualization
  • Critical thinking


Example Code:
# Install libraries first if needed:
# pip install pandas matplotlib

import pandas as pd
import matplotlib.pyplot as plt

# Create dataset
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [23, 25, 22, 24, 23],
    "Score": [85, 90, 78, 88, 92]
}

# Create DataFrame
df = pd.DataFrame(data)

# Display dataset
print("Full Dataset:")
print(df)

# Show first rows
print("\nFirst 5 Rows:")
print(df.head())

# Summary statistics
print("\nSummary Statistics:")
print(df.describe())

# Calculate mean score
mean_score = df["Score"].mean()
print("\nAverage Score:", mean_score)

# Filter students with score > 85
high_scores = df[df["Score"] > 85]
print("\nStudents with Score > 85:")
print(high_scores)

# Create bar chart
plt.bar(df["Name"], df["Score"])
plt.title("Student Scores")
plt.xlabel("Name")
plt.ylabel("Score")
plt.show()
Data Analysis
Introduction Why Become a Data Analyst? Who Needs a Data Analyst? Why Is Data Analytics Important?
All Courses
Bootstrap Content Writing CSS Cyber Security Data Analysis Deep Learning Email Marketing Excel HTML Java Script Machine Learning MySQLi PHP Power Bi Python for Analysis SEO SMM SQL