Unrolled computation graphs are prevalent throughout machine learning but present challenges to automatic differentiation (AD) gradient estimation methods when their loss functions exhibit extreme local sensitivtiy, discontinuity, or blackbox characteristics. In such scenarios, online evolution strategies methods are a more capable alternative, while being more parallelizable than vanilla evolution strategies (ES) by interleaving partial unrolls and gradient updates. In this work, we propose a general class of unbiased online evolution strategies methods. We analytically and empirically characterize the variance of this class of gradient estimators and identify the one with the least variance, which we term Noise-Reuse Evolution Strategies (NRES). Experimentally, we show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of unroll steps across a variety of applications, including learning dynamical systems, meta-training learned optimizers, and reinforcement learning.
翻译:展开计算图在机器学习中普遍存在,但当其损失函数表现出极端局部敏感性、不连续性或黑箱特性时,对自动微分(AD)梯度估计方法提出了挑战。在此类场景下,在线进化策略方法是一种更有效的替代方案,同时通过交错部分展开和梯度更新,比普通进化策略(ES)具有更好的可并行性。本文提出了一类通用的无偏在线进化策略方法。我们从分析和实验角度刻画了该类梯度估计器的方差特性,并识别出方差最小的估计器,将其命名为噪声重用进化策略(NRES)。实验表明,在包括学习动力系统、元训练学习优化器和强化学习在内的多种应用中,NRES在运行时间和展开步骤数量上均比现有AD和ES方法实现更快的收敛。