Post hoc explanations have emerged as a way to improve user trust in machine learning models by providing insight into model decision-making. However, explanations tend to be evaluated based on their alignment with prior knowledge while the faithfulness of an explanation with respect to the model, a fundamental criterion, is often overlooked. Furthermore, the effect of explanation faithfulness and alignment on user trust and whether this effect differs among laypeople and domain experts is unclear. To investigate these questions, we conduct a user study with computer science students and doctors in three domain areas, controlling the laypeople and domain expert groups in each setting. The results indicate that laypeople base their trust in explanations on explanation faithfulness while domain experts base theirs on explanation alignment. To our knowledge, this work is the first to show that (1) different factors affect laypeople and domain experts' trust in post hoc explanations and (2) domain experts are subject to specific biases due to their expertise when interpreting post hoc explanations. By uncovering this phenomenon and exposing this cognitive bias, this work motivates the need to educate end users about how to properly interpret explanations and overcome their own cognitive biases, and motivates the development of simple and interpretable faithfulness metrics for end users. This research is particularly important and timely as post hoc explanations are increasingly being used in high-stakes, real-world settings such as medicine.
翻译:摘要:事后解释通过揭示模型决策过程,已成为提升用户对机器学习模型信任度的一种方式。然而,解释通常根据其与先验知识的一致性进行评估,而解释相对于模型的基本准则——忠实性——却常被忽视。此外,解释的忠实性和一致性对用户信任的影响,以及这种影响在外行与领域专家之间是否存在差异,目前尚不明确。为探究这些问题,我们在三个领域场景下对计算机科学学生和医生开展了用户研究,并在各场景中分别控制了外行群体与领域专家群体。结果表明,外行对解释的信任基于解释的忠实性,而领域专家的信任则基于解释的一致性。据我们所知,本研究首次揭示了:(1)外行与领域专家对事后解释的信任受不同因素影响;(2)领域专家在解读事后解释时,会因其专业知识而产生特定偏差。通过揭示这一现象并暴露此类认知偏差,本研究推动了以下需求:教育终端用户正确解读解释并克服自身认知偏差,以及开发适用于终端用户的简单可解释忠实性评估指标。鉴于事后解释正越来越多地应用于医学等高风险现实场景,本研究具有重要的现实意义与时效性。