Table of Contents
Installation #
Testing whether the fast version is installed:
#!python
>>> from gensim.models import word2vec
>>> assert word2vec.FAST_VERSION > -1
Models #
Phrases #
This model detects multi-word phrases that can be grouped, such as new_york_times
. Can be used as a preprocessor for word2vec or doc2vec models.
#!python
>>> bigram_transformer = gensim.models.Phrases(sentences)
>>> model = Word2Vec(bigram_transformed[sentences], size=100, ...)
word2vec #
Vocab
object contains a word and its frequency (count
) and other properties (e.g. sample_int
is used for sampling purpose)
Let V as the size of the vocabulary and N as the dimension of the hidden layer (vector dimension).
model.syn0
: \( V \times N \) matrix.model.syn0[wordindex]
returns the word vector.
doc2vec #
Doc2Vec class #
_do_train_job(self, job, alpha, inits)
: job
is just sentences.
DocvecsArray #
the document vectors are stored in this object.
indexed_doctags(self, doctag_tokens)
: given doctag_tokens
(a list of document tags), return (integer index, doctag_syn0
, self.doctag_syn0_lockf
, doctag_tokens
).
Incoming Links #
Related Articles (Article 0) #
Suggested Pages #
- 0.261 tfidf
- 0.196 Lexical feature selection
- 0.048 Text analysis
- 0.039 Network geometry
- 0.037 Paper/Levy2014a
- 0.035 Tomas Mikolov
- 0.034 Graph embedding
- 0.030 Pointwise mutual information
- 0.028 Hierarchical softmax
- 0.028 Sentence embedding
- More suggestions...