This metric counts the frequency at which a given topic is the single most frequent topic in a document. The difference is often measurable in terms of burstiness.Ī content-ful topic will occur in relatively few documents, but when it does, it will produce a lot of tokens.Ī "background" topic will occur in many documents and have a high overall token count, but never produce many tokens in any single document. Academic writing will talk about "paper abstract data", and a Wikipedia article will talk about "list links history". Some topics are specific, while others aren't really "topics" but language that comes up because we are writing in a certain context. ![]() The highest ranked topic in this metric is the "polish poland danish denmark sweden swedish na norway norwegian sk red" topic, suggesting that those ill-fitting words may be isolated in a few documents.Īlthough this metric has the same goal as coherence, the two don't appear to correlate well: bursty words aren't necessarily unrelated to the topic, they're just unusually frequent in certain contexts. This metric compares the number of times a word occurs in a topic (measured in tokens)Īnd the number of documents the word occurs in as that topic (instances of the word assigned to other topics are not counted). In the sorted list of words, but may not be a good representative word for the topic. \[P(d | k) = \frac > 0)\) is proportional to the number of documents that contain at least one token of type \(w\) that is assigned to the topic.Ī words that occurs many times in only a few documents may appear prominently ![]() ![]() We usually think of the probability of a topic given a document.įor this metric we calculate the probability of documents given a topic.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |