Week 11 • Tuesday

Block 103: Bag-of-Words & TF-IDF

Convert text into numeric features for machine learning.

Concepts

Code Examples

See exercise below.

Exercise

Vectorize 5 example sentences with CountVectorizer. Print feature names and matrix. Compare top TF-IDF words for 2 short documents about different topics.

Homework

Why does TF-IDF often work better than raw word counts for classification? Explain intuitively.