Block 21: Series & DataFrame Basics
Create Series and DataFrames from scratch and understand their structure.
Concepts
- pd.Series from list/dict
- pd.DataFrame from dict of lists or list of dicts
- Index, columns, values
- .shape, .dtypes, .info(), .describe()
Code Examples
Series basics
import pandas as pd
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s)
print(s.values, type(s.values))
DataFrame from dict
df = pd.DataFrame({'A': [1,2,3], 'B': [4,5,6]})
print(df.shape, df.columns.tolist())
df.info()
Exercise
Create a DataFrame of 5 cities with population, area, and country columns. Print the dtype of each column and describe() the numeric ones.
Solution
import pandas as pd
df = pd.DataFrame({
'city': ['Tokyo', 'Delhi', 'Shanghai', 'São Paulo', 'Mumbai'],
'population': [37.4, 31.0, 27.1, 22.4, 20.7],
'area_km2': [2194, 1484, 6341, 1521, 603],
'country': ['Japan', 'India', 'China', 'Brazil', 'India']
})
print(df.dtypes)
print(df.describe())Practice Problems
Hint: pd.Series({'a':1, 'b':2})
Hint: pd.DataFrame([{'x':1}, {'x':2}])
Application
DataFrames are the primary data structure for tabular data in Python. Every CSV, Excel file, or database query becomes a DataFrame.
Case Study
A marketing team exports campaign data to CSV. Loading with pd.read_csv() gives a DataFrame. .describe() instantly shows spend, clicks, and conversion stats.
Visualization
After creating the cities DataFrame, plot population vs area as a scatter plot using df.plot(kind='scatter', x='area_km2', y='population').
Homework
Compare a pandas Series to a NumPy array — what are 2 differences and 1 similarity?