The brain can learn to execute a wide variety of tasks quickly and efficiently. Nevertheless, most of the mechanisms that enable us to learn are unclear or incredibly complicated. Recently, considerable efforts have been made in neuroscience and artificial intelligence to understand and model the structure and mechanisms behind the amazing learning capability of the brain. However, in the current understanding of cognitive neuroscience, it is widely accepted that synaptic plasticity plays an essential role in our amazing learning capability. This mechanism is also known as the Credit Assignment Problem (CAP) and is a fundamental challenge in neuroscience and Artificial Intelligence (AI). The observations of neuroscientists clearly confirm the role of two important mechanisms including the error feedback system and unsupervised learning in synaptic plasticity. With this inspiration, a new learning rule is proposed via the fusion of reinforcement learning (RL) and unsupervised learning (UL). In the proposed computational model, the nonlinear optimal control theory is used to resemble the error feedback loop systems and project the output error to neurons membrane potential (neurons state), and an unsupervised learning rule based on neurons membrane potential or neurons activity are utilized to simulate synaptic plasticity dynamics to ensure that the output error is minimized.
翻译:大脑能够快速高效地学习执行各种任务,然而,支撑这种学习能力的大多数机制尚不明确或极其复杂。近年来,神经科学与人工智能领域已投入大量研究,试图理解并模拟大脑惊人学习能力背后的结构与机制。然而,在当代认知神经科学的认知中,突触可塑性被广泛认为是我们卓越学习能力的核心要素。该机制亦被称为"信用分配问题",是神经科学与人工智能领域的根本性挑战。神经科学家的观察明确证实,错误反馈系统与无监督学习这两种重要机制在突触可塑性中发挥着关键作用。受此启发,本文提出一种融合强化学习与无监督学习的全新学习规则。在该计算模型中,利用非线性最优控制理论模拟错误反馈环路系统,将输出误差投影至神经元膜电位(神经元状态);同时,基于神经元膜电位或神经元活性的无监督学习规则被用于模拟突触可塑性动态过程,从而确保输出误差最小化。