Block 71: ML Workflow & train-test split
Understand the end-to-end supervised learning workflow.
Concepts
- Features (X) and labels (y)
- train_test_split with test_size and random_state
- Avoiding data leakage
- Scikit-learn estimator API: fit(), predict(), score()
Code Examples
See exercise below.
Exercise
Load Iris, split 80/20, confirm shapes. Fit a KNN classifier. Print test accuracy. Try 3 different random_state values and observe how accuracy changes.
Homework
What is data leakage? Give a concrete example of how it happens and why it's catastrophic.