Deep neural networks have demonstrated superior performance in artificial intelligence applications, but the opaqueness of their inner working mechanism is one major drawback in their application. The prevailing unit-based interpretation is a statistical observation of stimulus-response data, which fails to show a detailed internal process of inherent mechanisms of neural networks. In this work, we analyze a convolutional neural network (CNN) trained in the classification task and present an algorithm to extract the diffusion pathways of individual pixels to identify the locations of pixels in an input image associated with object classes. The pathways allow us to test the causal components which are important for classification and the pathway-based representations are clearly distinguishable between categories. We find that the few largest pathways of an individual pixel from an image tend to cross the feature maps in each layer that is important for classification. And the large pathways of images of the same category are more consistent in their trends than those of different categories. We also apply the pathways to understanding adversarial attacks, object completion, and movement perception. Further, the total number of pathways on feature maps in all layers can clearly discriminate the original, deformed, and target samples.
翻译:深度神经网络在人工智能应用中展现了卓越的性能,但其内部工作机制的不透明性是应用中的主要缺陷之一。当前主流的基于单元的解释方法是对刺激-响应数据的统计观察,未能展示神经网络内在机制的详细内部过程。在本工作中,我们分析了用于分类任务的卷积神经网络(CNN),并提出了一种算法来提取单个像素的扩散通路,以识别输入图像中与物体类别相关的像素位置。这些通路使我们能够测试对分类至关重要的因果成分,并且基于通路的表示在不同类别之间具有明显的可区分性。我们发现,来自图像的单个像素的少数最大通路倾向于穿过各层中与分类相关的特征图。并且,相同类别图像的大通路在趋势上比不同类别图像的一致性更高。我们还将这些通路应用于理解对抗性攻击、物体完成和运动感知。此外,所有层上特征图的通路总数能够清晰地区分原始样本、变形样本和目标样本。