Block 129: Scheduling & Automation Concepts
Understand how ETL pipelines are scheduled in practice.
Concepts
- Python schedule library for simple scheduling
- Cron job concept
- Running a script repeatedly with time.sleep()
- Idempotency and checkpointing
Code Examples
See exercise below.
Exercise
Use the schedule library to run a data update function every 60 seconds for a demo. Add a 'last_run.txt' checkpoint so the script skips records already processed.
Homework
Research one real-world pipeline orchestration tool (Airflow, Prefect, or Dagster). Write a 100-word summary of what it does.