A multiverse analysis evaluates all combinations of "reasonable" analytic decisions to promote robustness and transparency, but can lead to a combinatorial explosion of analyses to compute. Long delays before assessing results prevent users from diagnosing errors and iterating early. We contribute (1) approximation algorithms for estimating multiverse sensitivity and (2) monitoring visualizations for assessing progress and controlling execution on the fly. We evaluate how quickly three sampling-based algorithms converge to accurately rank sensitive decisions in both synthetic and real multiverse analyses. Compared to uniform random sampling, round robin and sketching approaches are 2 times faster in the best case, while on average estimating sensitivity accurately using 20% of the full multiverse. To enable analysts to stop early to fix errors or decide when results are "good enough" to move forward, we visualize both effect size and decision sensitivity estimates with confidence intervals, and surface potential issues including runtime warnings and model quality metrics.
翻译:多宇宙分析通过评估所有“合理”分析决策的组合,旨在提升研究的稳健性与透明度,但可能导致需要计算的分析组合呈指数级增长。在结果评估前的长时间延迟会阻碍用户及早诊断错误并迭代优化。本文贡献了:(1)用于估算多宇宙敏感性的近似算法;(2)用于评估进度并实时控制执行过程的监控可视化手段。我们评估了三种基于采样的算法在合成数据与真实多宇宙分析中快速收敛至准确排序敏感决策的速度。与均匀随机采样相比,轮询与草图方法在最佳情况下速度提升2倍,且平均仅使用完整多宇宙20%的数据即可准确估算敏感性。为使分析人员能够及早停止以修正错误,或判断结果是否“足够好”以推进后续工作,我们通过置信区间可视化效应量与决策敏感性估计值,并呈现潜在问题(包括运行时警告与模型质量指标)。