Neural networks are widely adopted to solve complex and challenging tasks. Especially in high-stakes decision-making, understanding their reasoning process is crucial, yet proves challenging for modern deep networks. Feature visualization (FV) is a powerful tool to decode what information neurons are responding to and hence to better understand the reasoning behind such networks. In particular, in FV we generate human-understandable images that reflect the information detected by neurons of interest. However, current methods often yield unrecognizable visualizations, exhibiting repetitive patterns and visual artifacts that are hard to understand for a human. To address these problems, we propose to guide FV through statistics of real image features combined with measures of relevant network flow to generate prototypical images. Our approach yields human-understandable visualizations that both qualitatively and quantitatively improve over state-of-the-art FVs across various architectures. As such, it can be used to decode which information the network uses, complementing mechanistic circuits that identify where it is encoded. Code is available at: https://github.com/adagorgun/VITAL
翻译:神经网络已被广泛应用于解决复杂且具有挑战性的任务。特别是在高风险决策中,理解其推理过程至关重要,然而对于现代深度网络而言,这仍是一项挑战。特征可视化是一种强大的工具,可用于解码神经元响应的信息,从而更好地理解此类网络的推理机制。具体而言,在特征可视化中,我们生成人类可理解的图像,以反映目标神经元所检测到的信息。然而,现有方法往往产生难以识别的可视化结果,呈现出重复的模式和视觉伪影,使人难以理解。为解决这些问题,我们提出通过结合真实图像特征的统计量与相关网络流的度量来引导特征可视化,从而生成原型图像。我们的方法产生了人类可理解的可视化结果,在多种架构上均从定性和定量两方面改进了当前最先进的特征可视化技术。因此,该方法可用于解码网络所使用的信息,与识别信息编码位置的机制性电路形成互补。代码发布于:https://github.com/adagorgun/VITAL