Is Ignorance Bliss? The Role of Post Hoc Explanation Faithfulness and Alignment in Model Trust

Post hoc explanations have emerged as a way to improve user trust in machine learning models by providing insight into model decision-making. However, explanations tend to be evaluated based on their alignment with prior knowledge while the faithfulness of an explanation with respect to the model, a fundamental criterion, is often overlooked. Furthermore, the effect of explanation faithfulness and alignment on user trust and whether this effect differs among laypeople and domain experts is unclear. To investigate these questions, we conduct a user study with computer science students and doctors in three domain areas, controlling the laypeople and domain expert groups in each setting. The results indicate that laypeople base their trust in explanations on explanation faithfulness while domain experts base theirs on explanation alignment. To our knowledge, this work is the first to show that (1) different factors affect laypeople and domain experts' trust in post hoc explanations and (2) domain experts are subject to specific biases due to their expertise when interpreting post hoc explanations. By uncovering this phenomenon and exposing this cognitive bias, this work motivates the need to educate end users about how to properly interpret explanations and overcome their own cognitive biases, and motivates the development of simple and interpretable faithfulness metrics for end users. This research is particularly important and timely as post hoc explanations are increasingly being used in high-stakes, real-world settings such as medicine.

翻译：摘要：事后解释通过揭示模型决策过程，已成为提升用户对机器学习模型信任度的一种方式。然而，解释通常根据其与先验知识的一致性进行评估，而解释相对于模型的基本准则——忠实性——却常被忽视。此外，解释的忠实性和一致性对用户信任的影响，以及这种影响在外行与领域专家之间是否存在差异，目前尚不明确。为探究这些问题，我们在三个领域场景下对计算机科学学生和医生开展了用户研究，并在各场景中分别控制了外行群体与领域专家群体。结果表明，外行对解释的信任基于解释的忠实性，而领域专家的信任则基于解释的一致性。据我们所知，本研究首次揭示了：（1）外行与领域专家对事后解释的信任受不同因素影响；（2）领域专家在解读事后解释时，会因其专业知识而产生特定偏差。通过揭示这一现象并暴露此类认知偏差，本研究推动了以下需求：教育终端用户正确解读解释并克服自身认知偏差，以及开发适用于终端用户的简单可解释忠实性评估指标。鉴于事后解释正越来越多地应用于医学等高风险现实场景，本研究具有重要的现实意义与时效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/