Statistical practices such as building regression models or running hypothesis tests rely on following rigorous procedures of steps and verifying assumptions on data to produce valid results. However, common statistical tools do not verify users' decision choices and provide low-level statistical functions without instructions on the whole analysis practice. Users can easily misuse analysis methods, potentially decreasing the validity of results. To address this problem, we introduce GuidedStats, an interactive interface within computational notebooks that encapsulates guidance, models, visualization, and exportable results into interactive workflows. It breaks down typical analysis processes, such as linear regression and two-sample T-tests, into interactive steps supplemented with automatic visualizations and explanations for step-wise evaluation. Users can iterate on input choices to refine their models, while recommended actions and exports allow the user to continue their analysis in code. Case studies show how GuidedStats offers valuable instructions for conducting fluid statistical analyses while finding possible assumption violations in the underlying data, supporting flexible and accurate statistical analyses.
翻译:构建回归模型或执行假设检验等统计实践依赖于遵循严格的步骤流程并验证数据假设以产生有效结果。然而,常见的统计工具既不验证用户的决策选择,又仅提供低层级的统计函数而缺乏对完整分析实践的指导。用户极易误用分析方法,可能降低结果的有效性。为解决此问题,我们提出GuidedStats——一种嵌入计算笔记本的交互式界面,将引导机制、模型、可视化及可导出结果封装为交互式工作流。该系统将线性回归与双样本T检验等典型分析流程分解为多个交互步骤,辅以自动生成的可视化图表及分步评估解释。用户可通过迭代调整输入选择来优化模型,同时系统推荐的操作与导出功能支持用户在代码环境中延续分析。案例研究表明,GuidedStats能为流畅的统计分析提供有价值的指导,并在发现底层数据可能存在的假设违例时给予支持,从而实现灵活且精确的统计分析。