Animals survive in dynamic environments changing at arbitrary timescales, but such data distribution shifts are a challenge to neural networks. To adapt to change, neural systems may change a large number of parameters, which is a slow process involving forgetting past information. In contrast, animals leverage distribution changes to segment their stream of experience into tasks and associate them with internal task abstracts. Animals can then respond flexibly by selecting the appropriate task abstraction. However, how such flexible task abstractions may arise in neural systems remains unknown. Here, we analyze a linear gated network where the weights and gates are jointly optimized via gradient descent, but with neuron-like constraints on the gates including a faster timescale, nonnegativity, and bounded activity. We observe that the weights self-organize into modules specialized for tasks or sub-tasks encountered, while the gates layer forms unique representations that switch the appropriate weight modules (task abstractions). We analytically reduce the learning dynamics to an effective eigenspace, revealing a virtuous cycle: fast adapting gates drive weight specialization by protecting previous knowledge, while weight specialization in turn increases the update rate of the gating layer. Task switching in the gating layer accelerates as a function of curriculum block size and task training, mirroring key findings in cognitive neuroscience. We show that the discovered task abstractions support generalization through both task and subtask composition, and we extend our findings to a non-linear network switching between two tasks. Overall, our work offers a theory of cognitive flexibility in animals as arising from joint gradient descent on synaptic and neural gating in a neural network architecture.
翻译:动物能够在任意时间尺度变化的动态环境中生存,但此类数据分布偏移对神经网络构成挑战。为适应变化,神经系统可能改变大量参数,这是一个涉及遗忘过去信息的缓慢过程。相比之下,动物利用分布变化将其经验流分割为任务,并将其与内部任务抽象相关联。动物随后可通过选择适当的任务抽象进行灵活响应。然而,此类灵活任务抽象如何在神经系统中产生仍属未知。本文分析了一个线性门控网络,其中权重与门控通过梯度下降联合优化,但门控受到神经元类约束,包括更快的时间尺度、非负性和有界活性。我们观察到权重自组织为针对所遇任务或子任务的专用模块,而门控层形成切换适当权重模块(任务抽象)的独特表征。我们将学习动力学解析简化为有效特征空间,揭示了一个良性循环:快速适应的门控通过保护先前知识驱动权重专业化,而权重专业化反过来提高门控层的更新速率。门控层的任务切换速度随课程块大小和任务训练量加速,这反映了认知神经科学的关键发现。我们证明所发现的任务抽象通过任务和子任务组合支持泛化能力,并将研究结果扩展到在双任务间切换的非线性网络。总体而言,我们的工作提出了动物认知灵活性理论,认为其源于神经网络架构中突触与神经门控的联合梯度下降过程。