Week 3 • Monday

Block 21: Series & DataFrame Basics

Create Series and DataFrames from scratch and understand their structure.

Concepts

Code Examples

Series basics

import pandas as pd
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s)
print(s.values, type(s.values))

DataFrame from dict

df = pd.DataFrame({'A': [1,2,3], 'B': [4,5,6]})
print(df.shape, df.columns.tolist())
df.info()

Exercise

Create a DataFrame of 5 cities with population, area, and country columns. Print the dtype of each column and describe() the numeric ones.

Solution
import pandas as pd

df = pd.DataFrame({
    'city': ['Tokyo', 'Delhi', 'Shanghai', 'São Paulo', 'Mumbai'],
    'population': [37.4, 31.0, 27.1, 22.4, 20.7],
    'area_km2': [2194, 1484, 6341, 1521, 603],
    'country': ['Japan', 'India', 'China', 'Brazil', 'India']
})
print(df.dtypes)
print(df.describe())

Practice Problems

Problem 1: Create a Series from a dict. What happens to the dict keys?

Hint: pd.Series({'a':1, 'b':2})

Problem 2: Create a DataFrame from a list of dicts. When is this useful?

Hint: pd.DataFrame([{'x':1}, {'x':2}])

Application

DataFrames are the primary data structure for tabular data in Python. Every CSV, Excel file, or database query becomes a DataFrame.

Case Study

A marketing team exports campaign data to CSV. Loading with pd.read_csv() gives a DataFrame. .describe() instantly shows spend, clicks, and conversion stats.

Visualization

After creating the cities DataFrame, plot population vs area as a scatter plot using df.plot(kind='scatter', x='area_km2', y='population').

Homework

Compare a pandas Series to a NumPy array — what are 2 differences and 1 similarity?