With the continue development of Convolutional Neural Networks (CNNs), there is a growing concern regarding representations that they encode internally. Analyzing these internal representations is referred to as model interpretation. While the task of model explanation, justifying the predictions of such models, has been studied extensively; the task of model interpretation has received less attention. The aim of this paper is to propose a framework for the study of interpretation methods designed for CNN models trained from visual data. More specifically, we first specify the difference between the interpretation and explanation tasks which are often considered the same in the literature. Then, we define a set of six specific factors that can be used to characterize interpretation methods. Third, based on the previous factors, we propose a framework for the positioning of interpretation methods. Our framework highlights that just a very small amount of the suggested factors, and combinations thereof, have been actually studied. Consequently, leaving significant areas unexplored. Following the proposed framework, we discuss existing interpretation methods and give some attention to the evaluation protocols followed to validate them. Finally, the paper highlights capabilities of the methods in producing feedback for enabling interpretation and proposes possible research problems arising from the framework.
翻译:随着卷积神经网络(CNN)的持续发展,人们对其内部编码的表征日益关注。分析这些内部表征被称为模型解释。尽管模型说明任务(即论证此类模型预测的合理性)已得到广泛研究,但模型解释任务受到的关注相对较少。本文旨在提出一个针对基于视觉数据训练的CNN模型解释方法的研究框架。具体而言,我们首先明确了文献中常被视为等同的解释与说明任务之间的差异;其次,定义了可用于表征解释方法的六项具体要素;第三,基于上述要素,提出了解释方法定位框架。该框架揭示,实际被研究的要素及其组合仅占极小比例,从而留下大量未被探索的重要领域。依据所提框架,我们讨论了现有解释方法,并重点关注了其验证评估协议。最后,本文阐述了各类方法在生成可解释性反馈方面的能力,并提出了该框架衍生出的潜在研究问题。