Model explanations can be valuable for interpreting and debugging predictive models. We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts. Although popular for their easy interpretation, concept explanations are known to be noisy. We begin our work by identifying various sources of uncertainty in the estimation pipeline that lead to such noise. We then propose an uncertainty-aware Bayesian estimation method to address these issues, which readily improved the quality of explanations. We demonstrate with theoretical analysis and empirical evaluation that explanations computed by our method are robust to train-time choices while also being label-efficient. Further, our method proved capable of recovering relevant concepts amongst a bank of thousands, in an evaluation with real-datasets and off-the-shelf models, demonstrating its scalability. We believe the improved quality of uncertainty-aware concept explanations make them a strong candidate for more reliable model interpretation. We release our code at https://github.com/vps-anonconfs/uace.
翻译:模型解释对于解释和调试预测模型具有重要价值。本文研究一类称为“概念解释”的特定方法,其目标是通过人类可理解的概念来解读模型。尽管因易于解释而广受欢迎,但概念解释存在噪声问题。我们首先识别了估计流程中导致噪声的多种不确定性来源,随后提出一种具有不确定性意识的贝叶斯估计方法来解决这些问题,该方法显著提升了解释质量。通过理论分析与实验评估,我们证明了本方法生成的解释对训练时的选择具有鲁棒性,同时具备标签高效性。此外,在真实数据集与现成模型的评估中,我们的方法能从数千个概念中有效恢复相关概念,展现出良好的可扩展性。我们认为,具有不确定性意识的概念解释所提升的质量使其成为更可靠模型解释的有力候选方案。我们将代码开源至 https://github.com/vps-anonconfs/uace。