Keyphrases aid the exploration of text collections by communicating salient aspects of documents and are often used to create effective visualizations of text. While prior work in HCI and visualization has proposed a variety of ways of presenting keyphrases, less attention has been paid to selecting the best descriptive terms. In this article, we investigate the statistical and linguistic properties of keyphrases chosen by human judges and determine which features are most predictive of high-quality descriptive phrases. Based on 5,611 responses from 69 graduate students describing a corpus of dissertation abstracts, we analyze characteristics of human-generated keyphrases, including phrase length, commonness, position, and part of speech. Next, we systematically assess the contribution of each feature within statistical models of keyphrase quality. We then introduce a method for grouping similar terms and varying the specificity of displayed phrases so that applications can select phrases dynamically based on the available screen space and current context of interaction. Precision-recall measures find that our technique generates keyphrases that match those selected by human judges. Crowdsourced ratings of tag cloud visualizations rank our approach above other automatic techniques. Finally, we discuss the role of HCI methods in developing new algorithmic techniques suitable for user-facing applications.
BibTeX
@article{2012-keyphrases,
title = {"Without the Clutter of Unimportant Words": Descriptive Keyphrases for Text Visualization},
author = {Chuang, Jason AND Manning, Christopher AND Heer, Jeffrey},
journal = {ACM Trans. on Computer-Human Interaction},
year = {2012},
volume = {19},
number = {3},
pages = {1--29},
url = {https://idl.uw.edu/papers/keyphrases},
doi = {10.1145/2362364.2362367}
}