Identifying Interpretable Visual Features in Artificial and Biological Neural Systems

Single neurons in neural networks are often ``interpretable'' in that they represent individual, intuitively meaningful features. However, many neurons exhibit $\textit{mixed selectivity}$, i.e., they represent multiple unrelated features. A recent hypothesis proposes that features in deep networks may be represented in $\textit{superposition}$, i.e., on non-orthogonal axes by multiple neurons, since the number of possible interpretable features in natural data is generally larger than the number of neurons in a given network. Accordingly, we should be able to find meaningful directions in activation space that are not aligned with individual neurons. Here, we propose (1) an automated method for quantifying visual interpretability that is validated against a large database of human psychophysics judgments of neuron interpretability, and (2) an approach for finding meaningful directions in network activation space. We leverage these methods to discover directions in convolutional neural networks that are more intuitively meaningful than individual neurons, as we confirm and investigate in a series of analyses. Moreover, we apply the same method to two recent datasets of visual neural responses in the brain and find that our conclusions largely transfer to real neural data, suggesting that superposition might be deployed by the brain. This also provides a link with disentanglement and raises fundamental questions about robust, efficient and factorized representations in both artificial and biological neural systems.

翻译：神经网络中的单个神经元通常具有“可解释性”，即它们代表各自独立且直观上有意义的特征。然而，许多神经元表现出混合选择性，即它们同时表征多个无关特征。最新假说提出，深度网络中的特征可能以叠加形式表征，即由多个神经元沿非正交轴共同编码，这是因为自然数据中可能存在的可解释特征数量通常大于给定网络中的神经元数量。据此，我们应当能够在激活空间中寻找到并非与单个神经元对齐的有意义方向。本文提出：(1) 一种自动量化视觉可解释性的方法，该方法通过大规模人类心理物理学神经元可解释性判断数据库进行验证；(2) 一种在神经网络激活空间中寻找有意义方向的方法。我们利用这些方法在卷积神经网络中发现了比单个神经元更具直观意义的特征方向，并通过一系列分析予以验证与探究。此外，我们将相同方法应用于两组最新的大脑视觉神经响应数据集，发现结论在很大程度上可迁移至真实神经数据，表明大脑可能也采用了叠加表征机制。这为解耦学习提供了关联纽带，并对人工与生物神经系统中鲁棒、高效及分解式表征提出了根本性问题。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日