Big data and machine learning tools have jointly empowered humans in making data-driven decisions. However, many of them capture empirical associations that might be spurious due to confounding factors and subgroup heterogeneity. The famous Simpson's paradox is such a phenomenon where aggregated and subgroup-level associations contradict with each other, causing cognitive confusions and difficulty in making adequate interpretations and decisions. Existing tools provide little insights for humans to locate, reason about, and prevent pitfalls of spurious association in practice. We propose VISPUR, a visual analytic system that provides a causal analysis framework and a human-centric workflow for tackling spurious associations. These include a CONFOUNDER DASHBOARD, which can automatically identify possible confounding factors, and a SUBGROUP VIEWER, which allows for the visualization and comparison of diverse subgroup patterns that likely or potentially result in a misinterpretation of causality. Additionally, we propose a REASONING STORYBOARD, which uses a flow-based approach to illustrate paradoxical phenomena, as well as an interactive DECISION DIAGNOSIS panel that helps ensure accountable decision-making. Through an expert interview and a controlled user experiment, our qualitative and quantitative results demonstrate that the proposed "de-paradox" workflow and the designed visual analytic system are effective in helping human users to identify and understand spurious associations, as well as to make accountable causal decisions.
翻译:大数据与机器学习工具共同赋能人类进行数据驱动决策。然而,许多模型捕捉到的经验关联可能因混杂因素和子群异质性而成为虚假关联。著名的辛普森悖论即是一种聚合层面与子群层面关联相互矛盾的现象,导致认知混淆,难以进行充分解释与决策。现有工具难以帮助人类在实践中定位、推理并规避虚假关联的陷阱。我们提出VISPUR系统,这是一个提供因果分析框架与以人为本工作流的可视分析系统,用于应对虚假关联问题。该系统包括混杂因素仪表盘,可自动识别可能的混杂因素;子群查看器,支持可视化与比较可能导致因果误判的多样化子群模式。此外,我们提出推理故事板,采用基于流程的方法阐释悖论现象;以及交互式决策诊断面板,助力实现可问责决策。通过专家访谈与受控用户实验,定性与定量结果表明,所提出的"解悖"工作流及设计的可视分析系统能有效帮助人类用户识别与理解虚假关联,并做出可问责的因果决策。