Block 105: spaCy: Pipelines & POS Tagging
Use spaCy for industrial-strength NLP.
Concepts
- pip install spacy + python -m spacy download en_core_web_sm
- nlp(text) creates a Doc object
- Token attributes: .text, .pos_, .dep_, .is_stop
- Sentence segmentation: doc.sents
Code Examples
See exercise below.
Exercise
Parse a paragraph with spaCy. List all nouns, all verbs, and all proper nouns. Compare spaCy's tokenization with NLTK's for the same sentence.
Homework
What is a dependency tree? Draw the dependency tree for 'The cat sat on the mat'.