Week 8 • Monday

Block 71: ML Workflow & train-test split

Understand the end-to-end supervised learning workflow.

Concepts

Features (X) and labels (y)
train_test_split with test_size and random_state
Avoiding data leakage
Scikit-learn estimator API: fit(), predict(), score()

Code Examples

See exercise below.

Exercise

Load Iris, split 80/20, confirm shapes. Fit a KNN classifier. Print test accuracy. Try 3 different random_state values and observe how accuracy changes.

Homework

What is data leakage? Give a concrete example of how it happens and why it's catastrophic.