Periodic activation functions, often referred to as learned Fourier features have been widely demonstrated to improve sample efficiency and stability in a variety of deep RL algorithms. Potentially incompatible hypotheses have been made about the source of these improvements. One is that periodic activations learn low frequency representations and as a result avoid overfitting to bootstrapped targets. Another is that periodic activations learn high frequency representations that are more expressive, allowing networks to quickly fit complex value functions. We analyse these claims empirically, finding that periodic representations consistently converge to high frequencies regardless of their initialisation frequency. We also find that while periodic activation functions improve sample efficiency, they exhibit worse generalization on states with added observation noise -- especially when compared to otherwise equivalent networks with ReLU activation functions. Finally, we show that weight decay regularization is able to partially offset the overfitting of periodic activation functions, delivering value functions that learn quickly while also generalizing.
翻译:周期激活函数,通常被称为学习型傅里叶特征,已被广泛证明能够提升多种深度强化学习算法的样本效率与稳定性。关于这些改进的来源,学界提出了几种可能互斥的假设。一种观点认为,周期激活函数学习的是低频表示,从而避免对自举目标产生过拟合。另一种观点则认为,周期激活函数学习的是表达能力更强的高频表示,使得网络能够快速拟合复杂的价值函数。我们通过实证分析这些主张,发现周期表示无论其初始化频率如何,都会一致地收敛到高频。我们还发现,尽管周期激活函数提高了样本效率,但在添加了观测噪声的状态上,其泛化能力较差——尤其是与使用ReLU激活函数的其他条件相同的网络相比。最后,我们证明权重衰减正则化能够部分抵消周期激活函数的过拟合问题,从而获得学习速度快且泛化能力强的价值函数。