Block 48: Building a Data Cleaning Pipeline
Combine file I/O, regex, and pandas into a reusable pipeline.
Concepts
- Pipeline pattern: load → validate → clean → export
- Using functions to modularize each step
- Logging intermediate results with print or logging module
- Writing cleaned output files
Code Examples
See exercise below.
Exercise
Build a pipeline: read messy CSV → strip whitespace and fix casing → fill missing values → export cleaned CSV with a timestamp in the filename. Add a function that generates a short cleaning report (rows before/after, nulls removed).
Homework
Sketch the pipeline as a flowchart (even on paper). Identify where errors are most likely to occur. Friday