Week 15 • Tuesday

Block 143: Capstone: Data Cleaning & EDA

Clean raw data and explore key patterns.

Concepts

Applying cleaning pipeline from Week 5
EDA with pandas + Seaborn: 3–5 key plots
Identifying outliers, distributions, correlations
Documenting every cleaning decision

Code Examples

See exercise below.

Exercise

Clean your raw data: fix types, handle missing values, remove duplicates, normalize text if needed. Save cleaned CSV. Create 4 EDA plots that reveal key patterns. Write 1-sentence insight under each.

Homework

Write a 'cleaning log' that lists every decision made: what was removed/changed and why.