Word2Vec
- is a method to efficiently create word embeddings
- introduced by Google researchers since 2013
- is a shallow, two-layer feed-forward neural networks that are trained to reconstruct linguistic contexts of words
Word2Vec - Modes
Word2Vec can utilize either of two model architectures:
- continuous bag-of-words (CBoW) - the model predicts the current word from a window of surrounding context words
- continuous skip-gram - the model uses the current word to predict the surrounding window of context words
Word2Vec - Architectures
---cognitive-computing---machine-intelligence/ai---subfields/natural-language-processing-(nlp)---computational-linguistics/information-retrieval-(ir)---information-extraction-(ie)/feature-conversion---text-embeddings/embedding/word-embeddings/embedding/word2vec/word2vec-cbow-neural-network-architecture.png)
Word2Vec - Code
from gensim.models import KeyedVectors
model = KeyedVectors.load_word2vec_format('data/GoogleNews-vectors-negative300.bin', binary=True)
similar_words = model.most_similar('robots')
print(similar_words)