In recurrent neural networks (RNNs) used to model biological neural networks, noise is typically introduced during training to emulate biological variability and regularize learning. The expectation is that removing the noise at test time should preserve or improve performance. Contrary to this intuition, we find that continuous-time recurrent neural networks (CTRNNs) often perform best at a nonzero noise level, specifically, the same level used during training. This noise preference typically arises when noise is injected inside the neural activation function; networks trained with noise injected outside the activation function perform best with zero noise. Through analyses of simple function approximation, maze navigation, and single neuron regulator tasks, we show that the phenomenon stems from noise-induced shifts of fixed points (stationary distributions) in the underlying stochastic dynamics of the RNNs. These fixed point shifts are noise-level dependent and bias the network outputs when the noise is removed, degrading performance. Analytical and numerical results show that the bias arises when neural states operate near activation function nonlinearities, where noise is asymmetrically attenuated, and that performance optimization incentivizes operation near these nonlinearities. Thus, networks can overfit to the stochastic training environment itself rather than just to the input-output data. The phenomenon is distinct from stochastic resonance, wherein nonzero noise enhances signal processing. Our findings reveal that training noise can become an integral part of the computation learned by recurrent networks, with implications for understanding neural population dynamics and for the design of robust artificial RNNs.
翻译:在用于模拟生物神经网络的循环神经网络(RNN)中,训练过程中通常会引入噪声以模拟生物变异性和正则化学习。预期在测试时去除噪声应能保持或提升性能。与这一直觉相反,我们发现连续时间循环神经网络(CTRNN)通常在非零噪声水平下表现最佳,具体而言,即训练时所采用的相同噪声水平。这种噪声偏好通常出现在噪声被注入神经激活函数内部时;而在激活函数外部注入噪声进行训练的网络则在零噪声条件下表现最优。通过对简单函数逼近、迷宫导航及单神经元调节任务的分析,我们证明该现象源于RNN底层随机动力学中固定点(平稳分布)的噪声诱导偏移。这些固定点偏移具有噪声水平依赖性,并在噪声移除时对网络输出产生偏差,从而降低性能。解析与数值结果表明,当神经状态在激活函数非线性区域附近运行时,噪声会经历非对称衰减,此时偏差即会产生;而性能优化过程会激励网络在这些非线性区域附近运行。因此,网络可能过度拟合随机训练环境本身,而不仅仅是输入-输出数据。该现象不同于随机共振——后者通过非零噪声增强信号处理。我们的研究揭示,训练噪声可能成为循环神经网络所学计算的内在组成部分,这对理解神经群体动力学及设计鲁棒的人工RNN具有重要启示。