Selective Explanations: Leveraging Human Input to Align Explainable AI

While a vast collection of explainable AI (XAI) algorithms have been developed in recent years, they are often criticized for significant gaps with how humans produce and consume explanations. As a result, current XAI techniques are often found to be hard to use and lack effectiveness. In this work, we attempt to close these gaps by making AI explanations selective -- a fundamental property of human explanations -- by selectively presenting a subset from a large set of model reasons based on what aligns with the recipient's preferences. We propose a general framework for generating selective explanations by leveraging human input on a small sample. This framework opens up a rich design space that accounts for different selectivity goals, types of input, and more. As a showcase, we use a decision-support task to explore selective explanations based on what the decision-maker would consider relevant to the decision task. We conducted two experimental studies to examine three out of a broader possible set of paradigms based on our proposed framework: in Study 1, we ask the participants to provide their own input to generate selective explanations, with either open-ended or critique-based input. In Study 2, we show participants selective explanations based on input from a panel of similar users (annotators). Our experiments demonstrate the promise of selective explanations in reducing over-reliance on AI and improving decision outcomes and subjective perceptions of the AI, but also paint a nuanced picture that attributes some of these positive effects to the opportunity to provide one's own input to augment AI explanations. Overall, our work proposes a novel XAI framework inspired by human communication behaviors and demonstrates its potentials to encourage future work to better align AI explanations with human production and consumption of explanations.

翻译：近年来，虽然大量可解释AI（XAI）算法被开发出来，但它们常因与人产生和理解解释的方式存在显著差距而受到批评。因此，当前的XAI技术往往难以使用且缺乏有效性。在这项工作中，我们试图通过赋予AI解释选择性这一人类解释的基本属性来弥合这些差距——即根据接收者的偏好，从大量模型理由中选择性地呈现子集。我们提出一个通用框架，通过利用小样本的人类输入来生成选择性解释。该框架开辟了丰富的设计空间，可容纳不同的选择性目标、输入类型等。作为示范，我们使用一个决策支持任务，基于决策者认为与决策任务相关的内容来探索选择性解释。我们进行了两项实验研究，以检验基于所提框架的三种更广泛可能范式：在研究1中，我们要求参与者提供自己的输入以生成选择性解释，支持开放式或基于批评的输入；在研究2中，我们向参与者展示基于相似用户（标注者）小组输入生成的选择性解释。实验表明，选择性解释在减少对AI的过度依赖、改善决策结果及对AI的主观感知方面具有潜力，但也描绘了一幅微妙的图景，将这些积极效果部分归因于提供自身输入以增强AI解释的机会。总体而言，我们的工作提出了一种受人类沟通行为启发的新型XAI框架，并展示了其潜力，以鼓励未来研究更好地使AI解释与人类产生和理解解释的方式对齐。