Selective Explanations: Leveraging Human Input to Align Explainable AI

While a vast collection of explainable AI (XAI) algorithms have been developed in recent years, they are often criticized for significant gaps with how humans produce and consume explanations. As a result, current XAI techniques are often found to be hard to use and lack effectiveness. In this work, we attempt to close these gaps by making AI explanations selective -- a fundamental property of human explanations -- by selectively presenting a subset from a large set of model reasons based on what aligns with the recipient's preferences. We propose a general framework for generating selective explanations by leveraging human input on a small sample. This framework opens up a rich design space that accounts for different selectivity goals, types of input, and more. As a showcase, we use a decision-support task to explore selective explanations based on what the decision-maker would consider relevant to the decision task. We conducted two experimental studies to examine three out of a broader possible set of paradigms based on our proposed framework: in Study 1, we ask the participants to provide their own input to generate selective explanations, with either open-ended or critique-based input. In Study 2, we show participants selective explanations based on input from a panel of similar users (annotators). Our experiments demonstrate the promise of selective explanations in reducing over-reliance on AI and improving decision outcomes and subjective perceptions of the AI, but also paint a nuanced picture that attributes some of these positive effects to the opportunity to provide one's own input to augment AI explanations. Overall, our work proposes a novel XAI framework inspired by human communication behaviors and demonstrates its potentials to encourage future work to better align AI explanations with human production and consumption of explanations.

翻译：尽管近年来开发了大量可解释人工智能（XAI）算法，但它们常因与人类产生和消费解释的方式存在显著差距而受到批评。因此，当前XAI技术往往难以使用且缺乏有效性。本研究尝试通过让AI解释具有选择性（人类解释的基本属性）来缩小这些差距，即根据接收者的偏好，从大量模型原因中选择性呈现子集。我们提出一个通用框架，通过利用小样本的人类输入生成选择性解释。该框架开辟了丰富的设计空间，可涵盖不同的选择性目标、输入类型等。作为示例，我们使用决策支持任务探索基于决策者认为与决策任务相关的选择性解释。我们开展了两项实验研究，检验基于所提框架的三大范式（源于更广泛可能范式中选取的三种）：研究1中，参与者提供自身输入以生成选择性解释，输入方式分为开放式或基于批评的输入；研究2中，参与者基于相似用户（标注员）组的输入获得选择性解释。实验表明，选择性解释在减少对AI的过度依赖、改善决策结果及对AI的主观感知方面具有潜力，同时也揭示出这些积极效应部分归因于参与者有机会提供自身输入来增强AI解释的细微现象。总体而言，本研究提出受人类沟通行为启发的新型XAI框架，并展示其潜力，以鼓励未来工作更好地使AI解释与人类产生和消费解释的方式对齐。