As Artificial Intelligence (AI) systems increasingly influence decision-making across various fields, the need to attribute responsibility for undesirable outcomes has become essential, though complicated by the complex interplay between humans and AI. Existing attribution methods based on actual causality and Shapley values tend to disproportionately blame agents who contribute more to an outcome and rely on real-world measures of blameworthiness that may misalign with responsible AI standards. This paper presents a causal framework using Structural Causal Models (SCMs) to systematically attribute responsibility in human-AI systems, measuring overall blameworthiness while employing counterfactual reasoning to account for agents' expected epistemic levels. Two case studies illustrate the framework's adaptability in diverse human-AI collaboration scenarios.
翻译:随着人工智能系统在各领域决策中的影响日益增强,对不良后果进行责任归因的需求变得至关重要,然而人机之间复杂的相互作用使得这一问题变得尤为棘手。现有的基于实际因果关系和沙普利值的归因方法往往过度归咎于对结果贡献更大的主体,并且依赖于现实世界的可责性度量,这可能与负责任的人工智能标准不一致。本文提出了一种利用结构因果模型的因果框架,以系统地对人机系统中的责任进行归因,在衡量总体可责性的同时,运用反事实推理来考量主体的预期认知水平。两个案例研究展示了该框架在不同人机协作场景中的适应性。