Which of the following best describes the fundamental difference in how static word embeddings (e.g., Word2Vec, GloVe) and contextual embeddings (e.g., BERT, ELMo) represent the meaning of a word?
Question 2
Given a vector space model, if two word vectors, $\vec{w_1}$ and $\vec{w_2}$, have a high cosine similarity, what can be inferred about their semantic relationship?
Question 3
In the context of distributed representations, what is the primary reason for using a 'window size' during the training of models like Word2Vec?
Question 4
Consider a scenario where a word embedding model is trained on a very large corpus, resulting in a vocabulary size of $10^6$ words. What is a significant computational challenge associated with this large vocabulary size, particularly for traditional neural network-based embedding models?
Question 5
Which of the following is a key trade-off when choosing between a larger versus a smaller 'window size' in word embedding models like Skip-gram?
Vector Representations Quiz — Natural Language Processing | A-Warded