A biologically plausible method for training an Artificial Neural Network (ANN) involves treating each unit as a stochastic Reinforcement Learning (RL) agent, thereby considering the network as a team of agents. Consequently, all units can learn via REINFORCE, a local learning rule modulated by a global reward signal, which aligns more closely with biologically observed forms of synaptic plasticity. Nevertheless, this learning method is often slow and scales poorly with network size due to inefficient structural credit assignment, since a single reward signal is broadcast to all units without considering individual contributions. Weight Maximization, a proposed solution, replaces a unit's reward signal with the norm of its outgoing weight, thereby allowing each hidden unit to maximize the norm of the outgoing weight instead of the global reward signal. In this research report, we analyze the theoretical properties of Weight Maximization and propose a variant, Unbiased Weight Maximization. This new approach provides an unbiased learning rule that increases learning speed and improves asymptotic performance. Notably, to our knowledge, this is the first learning rule for a network of Bernoulli-logistic units that is unbiased and scales well with the number of network's units in terms of learning speed.
翻译:一种训练人工神经网络的生物合理方法是将每个单元视为随机强化学习代理,从而将网络视为代理团队。因此,所有单元都可以通过REINFORCE算法学习,这是一种由全局奖励信号调制的局部学习规则,更符合生物观察到的突触可塑性形式。然而,这种学习方法通常较慢,并且由于结构信用分配效率低下,其性能随网络规模扩大而下降,因为单个奖励信号被广播给所有单元,而未考虑各自贡献。权重最大化作为一种解决方案,将单元的奖励信号替换为其输出权重的范数,从而使每个隐藏单元最大化输出权重的范数而非全局奖励信号。在本研究报告中,我们分析了权重最大化的理论特性,并提出了一种变体——无偏权重最大化。这种方法提供了一种无偏学习规则,可提高学习速度并改善渐近性能。值得注意的是,据我们所知,这是针对伯努利-逻辑斯蒂单元网络的首个无偏且在学习速度上随网络单元数量扩展性良好的学习规则。