Explaining autonomous and intelligent systems is critical in order to improve trust in their decisions. Counterfactuals have emerged as one of the most compelling forms of explanation. They address ``why not'' questions by revealing how decisions could be altered. Despite the growing literature, most existing explainers focus on a single type of counterfactual and are restricted to local explanations, focusing on individual instances. There has been no systematic study of alternative counterfactual types, nor of global counterfactuals that shed light on a system's overall reasoning process. This paper addresses the two gaps by introducing an axiomatic framework built on a set of desirable properties for counterfactual explainers. It proves impossibility theorems showing that no single explainer can satisfy certain axiom combinations simultaneously, and fully characterizes all compatible sets. Representation theorems then establish five one-to-one correspondences between specific subsets of axioms and the families of explainers that satisfy them. Each family gives rise to a distinct type of counterfactual explanation, uncovering five fundamentally different types of counterfactuals. Some of these correspond to local explanations, while others capture global explanations. Finally, the framework situates existing explainers within this taxonomy, formally characterizes their behavior, and analyzes the computational complexity of generating such explanations.
翻译:解释自主智能系统对于提升其决策可信度至关重要。反事实解释已成为最具说服力的解释形式之一,通过揭示决策如何能被改变来回应"为何不"的问题。尽管相关研究日益增多,现有解释器大多聚焦单一类型的反事实解释,且局限于针对个别实例的局部解释。目前既缺乏对替代性反事实类型的系统性研究,也缺少能揭示系统整体推理过程的全局反事实解释。本文通过构建基于反事实解释器理想属性集的公理框架,同时填补这两项空白。研究证明了不可能性定理,表明不存在能同时满足特定公理组合的单一解释器,并完整刻画了所有相容公理集。随后通过表征定理建立了五组特定公理子集与满足这些公理的解释器族之间的一一对应关系。每个解释器族对应着本质上截然不同的反事实解释类型,由此揭示了五种根本不同的反事实形式。其中部分对应局部解释,其他则涵盖全局解释。最后,该框架将现有解释器纳入此分类体系,形式化刻画其行为特征,并分析了生成此类解释的计算复杂度。