UW Interactive Data Lab
IDL logo

Wrangler: Interactive Visual Specification of Data Transformation Scripts

Sean Kandel, Andreas Paepcke, Joseph Hellerstein, Jeffrey Heer. ACM Human Factors in Computing Systems (CHI), 2011
Figure for Wrangler: Interactive Visual Specification of Data Transformation Scripts
The Wrangler Interface. The left panel contains (from top-to-bottom) a history of transforms, a transform selection menu, and automatically suggested transforms based on the current selection. Bold text within the transform descriptions indicate parameters that can be clicked and revised. The right panel contains an interactive data table; above each column is a data quality meter.
Materials
Abstract
Though data analysis tools continue to improve, analysts still expend an inordinate amount of time and effort manipulating data and assessing data quality issues. Such "data wrangling" regularly involves reformatting data values or layout, correcting erroneous or missing values, and integrating multiple data sources. These transforms are often difficult to specify and difficult to reuse across analysis tasks, teams, and tools. In response, we introduce Wrangler, an interactive system for creating data transformations. Wrangler combines direct manipulation of visualized data with automatic inference of relevant transforms, enabling analysts to iteratively explore the space of applicable operations and preview their effects. Wrangler leverages semantic data types (e.g., geographic locations, dates, classification codes) to aid validation and type conversion. Interactive histories support review, refinement, and annotation of transformation scripts. User study results show that Wrangler significantly reduces specification time and promotes the use of robust, auditable transforms instead of manual editing.
BibTeX
@inproceedings{2011-wrangler,
  title = {Wrangler: Interactive Visual Specification of Data Transformation Scripts},
  author = {Kandel, Sean AND Paepcke, Andreas AND Hellerstein, Joseph AND Heer, Jeffrey},
  booktitle = {ACM Human Factors in Computing Systems (CHI)},
  year = {2011},
  url = {https://idl.uw.edu/papers/wrangler},
  doi = {10.1145/1978942.1979444}
}