The use of topic models to analyze domain-specific texts often requires manual validation of the latent topics to ensure that they are meaningful. We introduce a framework to support such a large-scale assessment of topical relevance. We measure the correspondence between a set of latent topics and a set of reference concepts to quantify four types of topical misalignment: junk, fused, missing, and repeated topics. Our analysis compares 10,000 topic model variants to 200 expert-provided domain concepts, and demonstrates how our framework can inform choices of model parameters, inference algorithms, and intrinsic measures of topical quality.
BibTeX
@inproceedings{2013-topic-model-diagnostics,
title = {Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment},
author = {Chuang, Jason AND Gupta, Sonal AND Manning, Christopher AND Heer, Jeffrey},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2013},
url = {https://idl.uw.edu/papers/topic-model-diagnostics}
}