The main question is this: how can we identify terms that are overrepresented in a given set of document?
Table of Contents
Log odds ratio informative Dirichlet prior #
When you want to contrast two corpora (e.g. Democrats vs. Republican). Needs a (big) background corpus. Seems to work really well in many cases.
- paper: Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict
An interesting application to restaurant menus: [http://uncommonculture.org/ojs/index.php/fm/article/view/4944/3863](Narrative framing of consumer sentiment in online restaurant reviews)