The objective of skip-gram model is to find representations of words that are useful for predicting the 'contexts' — surrounding words.
Given a sequence of training words \(w_1, w_2, \ldots, w_T\),
$$ \frac{1}{T} \sum_{t=1}^{T} \sum_{-c \ge j \ge c, j \neq 0} \log p(w_{t+j} | w_t) $$
Incoming Links #
Related Articles (Article 0) #
Suggested Pages #
- 0.149 Hierarchical softmax
- 0.135 Network geometry
- 0.088 Softmax function
- 0.065 Softmax
- 0.050 WordRank
- 0.050 wiki2vec
- 0.038 Neural network
- 0.025 Recurrent neural network
- 0.025 Gender bias
- 0.025 sense2vec
- More suggestions...