Block 57: Ethical Scraping & robots.txt
Understand the legal and ethical boundaries of web scraping.
Concepts
- Reading robots.txt to understand allowed/disallowed paths
- Respecting crawl delay
- Terms of Service and public vs private data
- Avoiding server overload
Code Examples
See exercise below.
Exercise
Fetch and print the robots.txt for 3 websites you use. Identify which paths are disallowed and which are allowed for scrapers.
Homework
Write a 150-word reflection: When is web scraping ethical? What lines should never be crossed?