Block 125: Building an ETL Pipeline
Extract, Transform, Load data from files into a database.
Concepts
- ETL pattern: Extract (read file) → Transform (pandas) → Load (to DB)
- Error handling in ETL with try/except
- Logging each step
- Idempotent loading: replace vs append
Code Examples
See exercise below.
Exercise
Build a pipeline: read CSV → clean with pandas (drop nulls, fix types) → write to SQLite → query and verify. Add a log entry for each step to a 'etl_log.txt' file.
Homework
What is idempotency in ETL? Why does it matter for scheduled pipelines?