Block 55: BeautifulSoup: HTML Parsing Basics
Parse and navigate HTML documents to extract data.
Concepts
- from bs4 import BeautifulSoup
- BeautifulSoup(html, 'html.parser')
- find() and find_all() by tag and attribute
- Extracting text and attributes: .text, .get('href')
Code Examples
See exercise below.
Exercise
Fetch a static web page and print all <h2> headings. Extract all hyperlinks from a page and store in a list.
Homework
What is the difference between find() and find_all()? When does each return None?