Week 5 • Thursday

Block 48: Building a Data Cleaning Pipeline

Combine file I/O, regex, and pandas into a reusable pipeline.

Concepts

Code Examples

See exercise below.

Exercise

Build a pipeline: read messy CSV → strip whitespace and fix casing → fill missing values → export cleaned CSV with a timestamp in the filename. Add a function that generates a short cleaning report (rows before/after, nulls removed).

Homework

Sketch the pipeline as a flowchart (even on paper). Identify where errors are most likely to occur. Friday