The objective of skip-gram model is to find representations of words that are useful for predicting the 'contexts' — surrounding words.
Given a sequence of training words \(w_1, w_2, \ldots, w_T\),
$$ \frac{1}{T} \sum_{t=1}^{T} \sum_{-c \ge j \ge c, j \neq 0} \log p(w_{t+j} | w_t) $$
Incoming Links #
Related Articles (Article 0) #
Suggested Pages #
- 0.117 Hierarchical softmax
- 0.093 Softmax function
- 0.087 Graph embedding
- 0.078 Tomas Mikolov
- 0.061 Gensim
- 0.049 wiki2vec
- 0.048 Network geometry
- 0.038 Softmax
- 0.036 Recurrent neural network
- 0.033 Sentence embedding
- More suggestions...