Evolution Strategies (ES) is a class of powerful black-box optimisation methods that are highly parallelisable and can handle non-differentiable and noisy objectives. However, naïve ES becomes prohibitively expensive at scale on GPUs due to the low arithmetic intensity of batched matrix multiplications with unstructured random perturbations. We introduce Evolution Guided GeneRal Optimisation via Low-rank Learning (EGGROLL), which improves arithmetic intensity by structuring individual perturbations as rank-$r$ matrices, resulting in a hundredfold increase in training speed for billion-parameter models at large population sizes, achieving up to 91% of the throughput of pure batch inference. We provide a rigorous theoretical analysis of Gaussian ES for high-dimensional parameter objectives, investigating conditions needed for ES updates to converge in high dimensions. Our results reveal a linearising effect, and proving consistency between EGGROLL and ES as parameter dimension increases. Our experiments show that EGGROLL: (1) enables the stable pretraining of nonlinear recurrent language models that operate purely in integer datatypes, (2) is competitive with GRPO for post-training LLMs on reasoning tasks, and (3) does not compromise performance compared to ES in tabula rasa RL settings, despite being faster.
翻译:进化策略(ES)是一类强大的黑盒优化方法,具有高度可并行化特性,能够处理不可微和含噪声的目标函数。然而,由于非结构化随机扰动批量矩阵乘法的算术强度较低,朴素ES在GPU上大规模运行时成本极高。我们提出了基于低秩学习的进化引导通用优化方法(EGGROLL),该方法通过将个体扰动构造为秩-$r$矩阵来提高算术强度,从而在庞大种群规模下使十亿参数模型的训练速度提升百倍,达到纯批量推理吞吐量的91%。我们对高维参数目标的高斯进化策略进行了严格的理论分析,探讨了ES更新在高维空间中收敛所需的条件。我们的结果揭示了一种线性化效应,并证明了随着参数维度的增加,EGGROLL与ES之间的一致性。实验表明,EGGROLL具有以下优势:(1)能够稳定预训练完全使用整数数据类型的非线性循环语言模型;(2)在大型语言模型推理任务的后训练中与GRPO具有竞争力;(3)在完全从零开始的强化学习场景中,尽管速度更快,但其性能与ES相比并未下降。