Human-AI complementarity is the claim that a human supported by an AI system can outperform either alone in a decision-making process. Since its introduction in the humanAI interaction literature, it has gained traction by generalizing the reliance paradigm and by offering a more practical alternative to the contested construct of trust in AI. Yet complementarity faces key theoretical challenges: it lacks precise theoretical anchoring, it is formalized only as a post hoc indicator of relative predictive accuracy, it remains silent about other desiderata of human-AI interactions, and it abstracts away from the magnitude-cost profile of its performance gain. As a result, complementarity is difficult to obtain in empirical settings. In this work, we leverage epistemology to address these challenges by reframing complementarity within the discourse on justificatory AI. Drawing on computational reliabilism, we argue that historical instances of complementarity function as evidence that a given human-AI interaction is a reliable epistemic process for a given predictive task. Together with other reliability indicators assessing the alignment of the human-AI team with the epistemic standards and socio-technical practices, complementarity contributes to the degree of reliability of human-AI teams when generating predictions. This repositioning supports the practical reasoning of those affected by these outputs -- patients, managers, regulators, and others. Our approach suggests that the role and value of complementarity lie not in providing a stand-alone measure of relative predictive accuracy, but in helping calibrate decision-making to the reliability of AI-supported processes. We conclude by translating this repositioning into design- and governance-oriented recommendations, including a minimal reporting checklist for justificatory human-AI interactions and measures of efficient complementarity.
翻译:人机互补性是指人类在AI系统辅助下,能在决策过程中超越单独使用任一方的表现。自人机交互文献引入这一概念以来,它通过泛化依赖范式、为争议性的人机信任构建提供更务实的替代方案而获得广泛关注。然而,互补性面临关键理论挑战:缺乏精确的理论锚定,仅被形式化为相对预测准确性的后验指标,未涉及人机交互的其他期望目标,并且抽象化了其性能增益的幅度-成本特征。这导致互补性在实证场景中难以实现。本研究利用认知论应对这些挑战,将互补性重构至可解释AI的论述框架中。基于计算可靠主义,我们论证:历史上互补性案例可作为特定预测任务中人机交互构成可信认识过程的证据。通过结合评估人机团队与认识标准及社会技术实践一致性的其他可靠性指标,互补性在预测生成时有助于判定人机团队的可靠程度。这一重新定位支持了相关产出所影响的群体(患者、管理者、监管者等)进行实践推理。我们的方法表明,互补性的角色与价值不在于提供独立的相对预测准确性度量,而在于帮助将决策校准至AI支持过程的可信度。最后,我们将此重新定位转化为面向设计与治理的实践建议,包括可解释人机交互的最低报告清单及高效互补性度量方法。