Neural networks are widely adopted to solve complex and challenging tasks. Especially in high-stakes decision-making, understanding their reasoning process is crucial, yet proves challenging for modern deep networks. Feature visualization (FV) is a powerful tool to decode what information neurons are responding to and hence to better understand the reasoning behind such networks. In particular, in FV we generate human-understandable images that reflect the information detected by neurons of interest. However, current methods often yield unrecognizable visualizations, exhibiting repetitive patterns and visual artifacts that are hard to understand for a human. To address these problems, we propose to guide FV through statistics of real image features combined with measures of relevant network flow to generate prototypical images. Our approach yields human-understandable visualizations that both qualitatively and quantitatively improve over state-of-the-art FVs across various architectures. As such, it can be used to decode which information the network uses, complementing mechanistic circuits that identify where it is encoded. Code is available at: https://github.com/adagorgun/VITAL
翻译:神经网络被广泛用于解决复杂且具有挑战性的任务。特别是在高风险决策中,理解其推理过程至关重要,但对于现代深度网络而言,这仍是一项挑战。特征可视化(FV)是一种强大的工具,可用于解码神经元响应的信息,从而更好地理解此类网络的推理机制。具体而言,在FV中,我们生成人类可理解的图像,以反映目标神经元所检测到的信息。然而,现有方法通常产生难以识别的可视化结果,呈现出重复的模式和视觉伪影,对人类而言难以理解。为解决这些问题,我们提出通过结合真实图像特征的统计量与相关网络流的度量来引导FV,以生成原型图像。我们的方法产生了人类可理解的可视化结果,在多种架构上,其定性和定量表现均优于当前最先进的FV方法。因此,该方法可用于解码网络所使用的信息,与识别信息编码位置的机制性电路形成互补。代码发布于:https://github.com/adagorgun/VITAL