The main question is this: how can we identify terms that are overrepresented in a given set of document?
Table of Contents
Methods #
Log odds ratio informative Dirichlet prior #
When you want to contrast two corpora (e.g. Democrats vs. Republican). Needs a (big) background corpus. Seems to work really well in many cases.
An interesting application to restaurant menus: [http://uncommonculture.org/ojs/index.php/fm/article/view/4944/3863](Narrative framing of consumer sentiment in online restaurant reviews)
code: https://gist.github.com/yy/a2fff314073c4806fd5b
tf-idf #
Pointwise mutual information #
Incoming Links #
Related Articles (Article 0) #
Suggested Pages #
- 0.242 Data analysis
- 0.088 Biomedical natural language processing
- 0.087 Convolutional neural network
- 0.085 StanfordNLP
- 0.041 Recurrent neural network
- 0.036 NLP
- 0.034 CNN
- 0.025 Information theory
- 0.025 Dialog act
- 0.024 Topic modeling
- More suggestions...