Why do biological and artificial neurons sometimes modularise, each encoding a single meaningful variable, and sometimes entangle their representation of many variables? In this work, we develop a theory of when biologically inspired networks -- those that are nonnegative and energy efficient -- modularise their representation of source variables (sources). We derive necessary and sufficient conditions on a sample of sources that determine whether the neurons in an optimal biologically-inspired linear autoencoder modularise. Our theory applies to any dataset, extending far beyond the case of statistical independence studied in previous work. Rather we show that sources modularise if their support is ``sufficiently spread''. From this theory, we extract and validate predictions in a variety of empirical studies on how data distribution affects modularisation in nonlinear feedforward and recurrent neural networks trained on supervised and unsupervised tasks. Furthermore, we apply these ideas to neuroscience data, showing that range independence can be used to understand the mixing or modularising of spatial and reward information in entorhinal recordings in seemingly conflicting experiments. Further, we use these results to suggest alternate origins of mixed-selectivity, beyond the predominant theory of flexible nonlinear classification. In sum, our theory prescribes precise conditions on when neural activities modularise, providing tools for inducing and elucidating modular representations in brains and machines.
翻译:为什么生物和人工神经元有时会模块化,每个神经元编码单个有意义的变量,而有时又会纠缠对多个变量的表征?在本工作中,我们建立了一个理论,用于解释生物启发网络——即那些非负且能量高效的网络——何时会将其对源变量(源)的表征模块化。我们推导出源变量样本的充分必要条件,这些条件决定了最优生物启发线性自编码器中的神经元是否实现模块化。我们的理论适用于任何数据集,远远超越了先前工作中研究的统计独立性情形。相反,我们证明,如果源变量的支撑集“充分分散”,它们就会模块化。基于这一理论,我们在多项实证研究中提取并验证了关于数据分布如何影响非线性前馈和循环神经网络在监督与非监督任务中模块化的预测。此外,我们将这些思想应用于神经科学数据,表明范围独立性可用于理解内嗅皮层记录中空间信息与奖赏信息在看似矛盾的实验中的混合或模块化现象。进一步地,我们利用这些结果提出了混合选择性除了主流理论(灵活非线性分类)之外的可能起源。总之,我们的理论明确了神经活动何时模块化的精确条件,为在大脑和机器中诱导并阐明模块化表征提供了工具。