Hybrid human-ML systems increasingly make consequential decisions in a wide range of domains. These systems are often introduced with the expectation that the combined human-ML system will achieve complementary performance, that is, the combined decision-making system will be an improvement compared with either decision-making agent in isolation. However, empirical results have been mixed, and existing research rarely articulates the sources and mechanisms by which complementary performance is expected to arise. Our goal in this work is to provide conceptual tools to advance the way researchers reason and communicate about human-ML complementarity. Drawing upon prior literature in human psychology, machine learning, and human-computer interaction, we propose a taxonomy characterizing distinct ways in which human and ML-based decision-making can differ. In doing so, we conceptually map potential mechanisms by which combining human and ML decision-making may yield complementary performance, developing a language for the research community to reason about design of hybrid systems in any decision-making domain. To illustrate how our taxonomy can be used to investigate complementarity, we provide a mathematical aggregation framework to examine enabling conditions for complementarity. Through synthetic simulations, we demonstrate how this framework can be used to explore specific aspects of our taxonomy and shed light on the optimal mechanisms for combining human-ML judgments
翻译:混合人机系统在诸多领域日益承担重要决策。这些系统常被寄予厚望,期望人机组合能实现互补性能,即联合决策系统相比任一单独决策主体表现更优。然而,实证结果却存在分歧,现有研究鲜少阐明互补性能的产生来源与作用机制。本工作旨在提供概念工具,以提升研究者对人机互补性的推理与沟通方式。基于心理学、机器学习与人机交互领域的既有文献,我们提出一套分类框架,系统刻画人类与基于机器学习的决策之间不同类型的差异。通过此框架,我们从概念层面映射人机决策结合可能产生互补性能的潜在机制,为研究社区构建跨决策领域混合系统设计的通用语言。为展示分类框架如何用于探究互补性,我们提出数学聚合框架以检验互补性的促成条件。通过合成仿真实验,我们演示了该框架如何用于解析分类框架的具体维度,并揭示人机判断融合的最优机制。