In the field of Explainable Artificial Intelligence (XAI), counterfactual examples explain to a user the predictions of a trained decision model by indicating the modifications to be made to the instance so as to change its associated prediction. These counterfactual examples are generally defined as solutions to an optimization problem whose cost function combines several criteria that quantify desiderata for a good explanation meeting user needs. A large variety of such appropriate properties can be considered, as the user needs are generally unknown and differ from one user to another; their selection and formalization is difficult. To circumvent this issue, several approaches propose to generate, rather than a single one, a set of diverse counterfactual examples to explain a prediction. This paper proposes a review of the numerous, sometimes conflicting, definitions that have been proposed for this notion of diversity. It discusses their underlying principles as well as the hypotheses on the user needs they rely on and proposes to categorize them along several dimensions (explicit vs implicit, universe in which they are defined, level at which they apply), leading to the identification of further research challenges on this topic.
翻译:在可解释人工智能(XAI)领域,反事实示例通过向用户说明为改变实例的预测结果而需做的修改,从而解释已训练决策模型的预测。这些反事实示例通常定义为优化问题的解,其代价函数结合了多个准则,以量化满足用户需求的良好解释所需的条件。由于用户需求通常未知且因人而异,选择和形式化这些恰当属性十分困难。为规避这一问题,多种方法提出生成一组多样化的反事实示例(而非单一示例)来解释预测。本文综述了针对“多样性”这一概念提出的众多且有时相互矛盾的定义,讨论了其底层原理及所依赖的用户需求假设,并沿多个维度(显式与隐式、定义所在的论域、适用层面)对其进行分类,进而指出了该主题下进一步的研究挑战。