Modern deep networks are highly complex and their inferential outcome very hard to interpret. This is a serious obstacle to their transparent deployment in safety-critical or bias-aware applications. This work contributes to post-hoc interpretability, and specifically Network Dissection. Our goal is to present a framework that makes it easier to discover the individual functionality of each neuron in a network trained on a vision task; discovery is performed in terms of textual description generation. To achieve this objective, we leverage: (i) recent advances in multimodal vision-text models and (ii) network layers founded upon the novel concept of stochastic local competition between linear units. In this setting, only a small subset of layer neurons are activated for a given input, leading to extremely high activation sparsity (as low as only $\approx 4\%$). Crucially, our proposed method infers (sparse) neuron activation patterns that enables the neurons to activate/specialize to inputs with specific characteristics, diversifying their individual functionality. This capacity of our method supercharges the potential of dissection processes: human understandable descriptions are generated only for the very few active neurons, thus facilitating the direct investigation of the network's decision process. As we experimentally show, our approach: (i) yields Vision Networks that retain or improve classification performance, and (ii) realizes a principled framework for text-based description and examination of the generated neuronal representations.
翻译:现代深度网络高度复杂,其推理结果极难解释。这是其在安全关键或偏见感知应用中透明部署的严重障碍。本文致力于事后可解释性,特别是网络剖析。我们的目标是提出一个框架,使得发现视觉任务训练网络中每个神经元个体功能更加容易;这种发现通过文本描述生成来实现。为实现此目标,我们利用:(i)多模态视觉-文本模型的最新进展,以及(ii)基于线性单元之间随机局部竞争这一新颖概念的网络层。在这种设定下,对于给定输入,仅有一小部分层神经元被激活,导致极高的激活稀疏性(低至约$\approx 4\%$)。关键在于,我们提出的方法推断出(稀疏的)神经元激活模式,使神经元能够激活/专门化于具有特定特征的输入,从而多样化其个体功能。我们方法的这一能力极大增强了剖析过程的潜力:仅为极少数活跃神经元生成人类可理解的描述,从而便于直接探究网络的决策过程。通过实验证明,我们的方法:(i)能够产生保持或提升分类性能的视觉网络,以及(ii)实现了一个基于文本描述与检查所生成神经元表征的原则性框架。