Online evolution strategies have become an attractive alternative to automatic differentiation (AD) due to their ability to handle chaotic and black-box loss functions, while also allowing more frequent gradient updates than vanilla Evolution Strategies (ES). In this work, we propose a general class of unbiased online evolution strategies. We analytically and empirically characterize the variance of this class of gradient estimators and identify the one with the least variance, which we term Noise-Reuse Evolution Strategies (NRES). Experimentally, we show that NRES results in faster convergence than existing AD and ES methods in terms of wall-clock speed and total number of unroll steps across a variety of applications, including learning dynamical systems, meta-training learned optimizers, and reinforcement learning.
翻译:在线演化策略因其处理混沌和黑盒损失函数的能力,同时允许比原始演化策略更频繁的梯度更新,已成为自动微分的具吸引力替代方案。本文提出一类通用的无偏在线演化策略。我们从分析与实证角度刻画该类梯度估计器的方差特性,并识别出方差最小的估计器——噪声复用演化策略。实验表明,在包括学习动态系统、元训练学习优化器及强化学习在内的多种应用中,NRES在时钟速度和总展开步数方面相比现有AD与ES方法实现了更快的收敛速度。