Block 143: Capstone: Data Cleaning & EDA
Clean raw data and explore key patterns.
Concepts
- Applying cleaning pipeline from Week 5
- EDA with pandas + Seaborn: 3–5 key plots
- Identifying outliers, distributions, correlations
- Documenting every cleaning decision
Code Examples
See exercise below.
Exercise
Clean your raw data: fix types, handle missing values, remove duplicates, normalize text if needed. Save cleaned CSV. Create 4 EDA plots that reveal key patterns. Write 1-sentence insight under each.
Homework
Write a 'cleaning log' that lists every decision made: what was removed/changed and why.