Post hoc explanations have emerged as a way to improve user trust in machine learning models by providing insight into model decision-making. However, explanations tend to be evaluated based on their alignment with prior knowledge while the faithfulness of an explanation with respect to the model, a fundamental criterion, is often overlooked. Furthermore, the effect of explanation faithfulness and alignment on user trust and whether this effect differs among laypeople and domain experts is unclear. To investigate these questions, we conduct a user study with computer science students and doctors in three domain areas, controlling the laypeople and domain expert groups in each setting. The results indicate that laypeople base their trust in explanations on explanation faithfulness while domain experts base theirs on explanation alignment. To our knowledge, this work is the first to show that (1) different factors affect laypeople and domain experts' trust in post hoc explanations and (2) domain experts are subject to specific biases due to their expertise when interpreting post hoc explanations. By uncovering this phenomenon and exposing this cognitive bias, this work motivates the need to educate end users about how to properly interpret explanations and overcome their own cognitive biases, and motivates the development of simple and interpretable faithfulness metrics for end users. This research is particularly important and timely as post hoc explanations are increasingly being used in high-stakes, real-world settings such as medicine.
翻译:后验解释通过揭示模型决策过程,已成为提升用户对机器学习模型信任度的一种途径。然而,解释通常依据其与先验知识的一致性进行评估,而解释相对于模型的忠实性这一基本准则却常被忽视。此外,解释的忠实性和一致性对用户信任的影响,以及这种影响在非专业用户和领域专家之间是否存在差异,尚不明确。为探究这些问题,我们以计算机科学专业学生和医生为研究对象,在三个领域场景中控制非专业用户与领域专家分组进行用户研究。结果表明:非专业用户基于解释的忠实性建立信任,而领域专家则基于解释的一致性建立信任。据我们所知,本研究首次揭示了(1)不同因素影响非专业用户与领域专家对后验解释的信任,(2)领域专家在解释后验解释时因其专业知识而产生特定偏见。通过揭示这一现象与认知偏差,本研究强调了教育终端用户如何正确解读解释、克服自身认知偏见的必要性,并推动了面向终端用户的简单可解释忠实性度量指标的发展。鉴于后验解释正越来越多地应用于医疗等高风险现实场景,此项研究具有尤为重要的现实意义与时效性。