Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships

Eunice Jun; Audrey Seo; Jeffrey Heer; René Just

doi:10.1145/3491102.3501888

UW Interactive Data Lab papers

Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships

Eunice Jun, Audrey Seo, Jeffrey Heer, René Just. ACM Human Factors in Computing Systems (CHI), 2022

Eunice Jun, Audrey Seo, Jeffrey Heer, René Just

ACM Human Factors in Computing Systems (CHI), 2022

Figure for Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships — Example Tisane GUI for disambiguation from usage scenario. Tisane asks analysts disambiguating questions about variables that are conceptually relevant and that analysts may have overlooked in their query. (A) The left hand panel gives an overview of the model the analyst is constructing. (B) Based on the variable relationships analysts specify (Listing 4), Ti- sane infers candidate main effects that may be potential confounders. Tisane asks analysts if they would like to include these variables, explaining in a tooltip (C) why the variable may be important to include. (D) Tisane only suggests interaction effects if analysts specify moderating relationships in their specification. This way, Tisane ensures that model structures are conceptually justifiable. (E) From the data measurement relationships analysts provide (line 15 in Listing 4), Tisane automatically infers and includes random effects to increase generalizability and external validity of statistical findings. (F) Tisane assists analysts in choosing an initial family and link function by asking them a series of questions about their dependent (e.g., Is the variable continuous or about count data?). To help analysts answer these questions and verify their assumptions about the data, Tisane shows a histogram of the dependent variable.

Materials

PDF | Software | Supplement | Honorable Mention Award

Abstract

Proper statistical modeling incorporates domain theory about how concepts relate and details of how data were measured. However, data analysts currently lack tool support for recording and reasoning about domain assumptions, data collection, and modeling choices in an integrated manner, leading to mistakes that can compromise scientific validity. For instance, generalized linear mixed-effects models (GLMMs) help answer complex research questions, but omitting random effects impairs the generalizability of results. To address this need, we present Tisane, a mixed-initiative system for authoring generalized linear models with and without mixed-effects. Tisane introduces a study design specification language for expressing and asking questions about relationships between variables. Tisane contributes an interactive compilation process that represents relationships in a graph, infers candidate statistical models, and asks follow-up questions to disambiguate user queries to construct a valid model. In case studies with three researchers, we find that Tisane helps them focus on their goals and assumptions while avoiding past mistakes.

BibTeX

@inproceedings{2022-tisane,
  title = {Tisane: Authoring Statistical Models via Formal Reasoning from Conceptual and Data Relationships},
  author = {Jun, Eunice AND Seo, Audrey AND Heer, Jeffrey AND Just, Ren\'{e}},
  booktitle = {ACM Human Factors in Computing Systems (CHI)},
  year = {2022},
  url = {https://idl.uw.edu/papers/tisane},
  doi = {10.1145/3491102.3501888}
}