The objective of skip-gram model is to find representations of words that are useful for predicting the 'contexts' — surrounding words.
Given a sequence of training words \(w_1, w_2, \ldots, w_T\),
$$ \frac{1}{T} \sum_{t=1}^{T} \sum_{-c \ge j \ge c, j \neq 0} \log p(w_{t+j} | w_t) $$
Incoming Links #
Related Articles (Article 0) #
Suggested Pages #
- 0.137 Hierarchical softmax
- 0.109 Softmax function
- 0.064 Graph embedding
- 0.051 Paper/Levy2014a
- 0.051 Gensim
- 0.047 sense2vec
- 0.046 Continuous embedding
- 0.046 WordRank
- 0.044 Word embedding
- 0.033 Softmax
- More suggestions...