How can we tell whether two neural networks are utilizing the same internal processes for a particular computation? This question is pertinent for multiple subfields of both neuroscience and machine learning, including neuroAI, mechanistic interpretability, and brain-machine interfaces. Standard approaches for comparing neural networks focus on the spatial geometry of latent states. Yet in recurrent networks, computations are implemented at the level of neural dynamics, which do not have a simple one-to-one mapping with geometry. To bridge this gap, we introduce a novel similarity metric that compares two systems at the level of their dynamics. Our method incorporates two components: Using recent advances in data-driven dynamical systems theory, we learn a high-dimensional linear system that accurately captures core features of the original nonlinear dynamics. Next, we compare these linear approximations via a novel extension of Procrustes Analysis that accounts for how vector fields change under orthogonal transformation. Via four case studies, we demonstrate that our method effectively identifies and distinguishes dynamic structure in recurrent neural networks (RNNs), whereas geometric methods fall short. We additionally show that our method can distinguish learning rules in an unsupervised manner. Our method therefore opens the door to novel data-driven analyses of the temporal structure of neural computation, and to more rigorous testing of RNNs as models of the brain.
翻译:我们如何判断两个神经网络在特定计算中是否利用相同的内部过程?这一问题与神经科学和机器学习的多个子领域相关,包括神经AI、机制可解释性和脑机接口。比较神经网络的常规方法聚焦于隐状态的空间几何结构。然而,在循环网络中,计算是在神经动力学层面实现的,而这种动力学与几何结构并不存在简单的一一对应关系。为弥合这一差距,我们提出一种新的相似性度量方法,从动力学层面比较两个系统。该方法包含两个组成部分:首先,利用数据驱动动力学系统理论的最新进展,学习一个高维线性系统,准确捕捉原始非线性动力学的核心特征;其次,通过拓展Procrustes分析的创新形式来比较这些线性近似,该拓展考虑了正交变换下向量场的变化。通过四个案例研究,我们证明该方法能有效识别和区分循环神经网络(RNN)中的动态结构,而几何方法则存在不足。此外,我们还证明该方法能以无监督方式区分学习规则。因此,我们的方法为神经计算时间结构的创新数据驱动分析开辟了新途径,并为更严格地将RNN作为大脑模型进行测试提供了可能。