Week 6 • Wednesday

Block 56: Scraping Tables & Building DataFrames

Turn HTML tables into pandas DataFrames.

Concepts

Code Examples

See exercise below.

Exercise

Use pd.read_html() to extract a table from a Wikipedia page. Manually parse a table row-by-row with BeautifulSoup and build a DataFrame.

Homework

pd.read_html() is convenient but sometimes wrong. When would you prefer manual parsing? Thursday