Block 107: Word Embeddings: Word2Vec & GloVe Concepts
Understand dense vector representations of words.
Concepts
- Limitations of bag-of-words (no semantics)
- Word embeddings: similar words → close vectors
- Using pre-trained embeddings via gensim or spaCy
- Cosine similarity between word vectors
Code Examples
See exercise below.
Exercise
Use spaCy's medium model to find 5 most similar words to 'king', 'doctor', 'python'. Compute cosine similarity between 'cat'/'dog' and 'cat'/'car' vectors.
Homework
Explain the famous word2vec analogy: king - man + woman ≈ queen. What does this reveal about embeddings?