Why do biological and artificial neurons sometimes modularise, each encoding a single meaningful variable, and sometimes entangle their representation of many variables? In this work, we develop a theory of when biologically inspired networks -- those that are nonnegative and energy efficient -- modularise their representation of source variables (sources). We derive necessary and sufficient conditions on a sample of sources that determine whether the neurons in an optimal biologically-inspired linear autoencoder modularise. Our theory applies to any dataset, extending far beyond the case of statistical independence studied in previous work. Rather we show that sources modularise if their support is ``sufficiently spread''. From this theory, we extract and validate predictions in a variety of empirical studies on how data distribution affects modularisation in nonlinear feedforward and recurrent neural networks trained on supervised and unsupervised tasks. Furthermore, we apply these ideas to neuroscience data, showing that range independence can be used to understand the mixing or modularising of spatial and reward information in entorhinal recordings in seemingly conflicting experiments. Further, we use these results to suggest alternate origins of mixed-selectivity, beyond the predominant theory of flexible nonlinear classification. In sum, our theory prescribes precise conditions on when neural activities modularise, providing tools for inducing and elucidating modular representations in brains and machines.
翻译:为什么生物和人工神经元有时会模块化,每个神经元编码单个有意义的变量,而有时又会纠缠对多个变量的表征?在本工作中,我们建立了一个理论,用于解释具有非负性和能量效率特性的生物启发网络何时会对其源变量(源)的表征进行模块化。我们推导出源变量样本的充分必要条件,这些条件决定了最优生物启发线性自编码器中的神经元是否会模块化。我们的理论适用于任何数据集,远远超越了先前工作中研究的统计独立性情形。相反,我们证明,当源变量的支撑集“充分分散”时,它们就会模块化。基于这一理论,我们提取并验证了在各种经验研究中的预测,这些研究探讨了数据分布如何影响在监督和非监督任务中训练的非线性前馈和循环神经网络的模块化。此外,我们将这些思想应用于神经科学数据,表明范围独立性可用于理解在看似矛盾的实验中,内嗅皮层记录的空间信息和奖赏信息是如何混合或模块化的。进一步,我们利用这些结果提出了混合选择性除了主流理论(灵活非线性分类)之外的可能起源。总之,我们的理论规定了神经活动何时模块化的精确条件,为在生物大脑和机器中诱导和阐明模块化表征提供了工具。