Statistical models should accurately reflect analysts' domain knowledge about variables and their relationships. While recent tools let analysts express these assumptions and use them to produce a resulting statistical model, it remains unclear what analysts want to express and how externalization impacts statistical model quality. This paper addresses these gaps. We first conduct an exploratory study of analysts using a domain-specific language (DSL) to express conceptual models. We observe a preference for detailing how variables relate and a desire to allow, and then later resolve, ambiguity in their conceptual models. We leverage these findings to develop rTisane, a DSL for expressing conceptual models augmented with an interactive disambiguation process. In a controlled evaluation, we find that rTisane's DSL helps analysts engage more deeply with and accurately externalize their assumptions. rTisane also leads to statistical models that match analysts' assumptions, maintain analysis intent, and better fit the data.
翻译:统计模型应准确反映分析人员关于变量及其关系的领域知识。尽管现有工具允许分析人员表达这些假设并将其用于生成统计模型,但分析人员究竟希望表达什么,以及外部化如何影响统计模型质量仍不明确。本文旨在填补这些研究空白。我们首先开展探索性研究,观察分析人员使用领域特定语言(DSL)表达概念模型的过程。研究显示,分析人员倾向于详细描述变量间的关系,并希望在概念模型中允许并随后解决歧义性。基于这些发现,我们开发了rTisane——一种通过交互式消歧过程增强的概念模型表达DSL。在对照评估中,我们发现rTisane的DSL能帮助分析人员更深入地参与假设表达并提升其外部化准确性。同时,rTisane生成的统计模型既能匹配分析人员的假设、保持分析意图,又能更好地拟合数据。