Post-hoc explainability methods aim to clarify predictions of black-box machine learning models. However, it is still largely unclear how well users comprehend the provided explanations and whether these increase the users ability to predict the model behavior. We approach this question by conducting a user study to evaluate comprehensibility and predictability in two widely used tools: LIME and SHAP. Moreover, we investigate the effect of counterfactual explanations and misclassifications on users ability to understand and predict the model behavior. We find that the comprehensibility of SHAP is significantly reduced when explanations are provided for samples near a model's decision boundary. Furthermore, we find that counterfactual explanations and misclassifications can significantly increase the users understanding of how a machine learning model is making decisions. Based on our findings, we also derive design recommendations for future post-hoc explainability methods with increased comprehensibility and predictability.
翻译:事后可解释性方法旨在阐明黑盒机器学习模型的预测结果。然而,用户对所提供的解释的理解程度,以及这些解释是否增强了用户预测模型行为的能力,目前仍尚不明确。我们通过开展一项用户研究来探讨这一问题,评估两种广泛使用的工具(LIME和SHAP)的可理解性与可预测性。此外,我们研究了反事实解释和错误分类对用户理解和预测模型行为能力的影响。我们发现,当为接近模型决策边界的样本提供解释时,SHAP的可理解性会显著降低。此外,我们发现反事实解释和错误分类能够显著提升用户对机器学习模型如何做出决策的理解。基于研究结果,我们还为未来具有更高可理解性和可预测性的事后可解释性方法提出了设计建议。