Fine-tuning large language models (LLMs) achieves strong performance but is often limited by the memory overhead of backpropagation. Zeroth-order (ZO) optimization avoids this overhead by estimating gradients through forward passes alone, yet it typically converges slowly because random Gaussian perturbations yield high-variance gradient estimates in high-dimensional parameter spaces. In this paper, we propose a plug-and-play framework that turns random perturbations into more effective descent directions. The key idea is to draw a small pool of candidate perturbations, evaluate their loss values, and then select or combine those that are best aligned with the optimization objective. We develop two instantiations of this idea: MeZO-GV, which forms a guiding vector from the contrast between low-loss and high-loss perturbation groups, and MeZO-Greedy, which keeps the single best perturbation within a fixed evaluation budget. We theoretically show that both strategies yield a larger per-step reduction in the objective than standard ZO estimation, leading to improved convergence rates. Experiments on LLMs of different scales and architectures confirm that the proposed methods integrate naturally with existing ZO optimizers and consistently improve convergence speed and task accuracy. On OPT-13B, our approach outperforms all ZO baselines across 11 benchmarks and exceeds gradient-based methods on 9 of them, while retaining the memory efficiency of forward-only optimization.
翻译:微调大语言模型(LLMs)性能强劲,但常受限于反向传播的内存开销。零阶优化通过仅需前向传播估计梯度来规避此开销,然而,由于高维参数空间中随机高斯扰动产生高方差梯度估计,其收敛速度通常缓慢。本文提出一种即插即用框架,将随机扰动转化为更有效的下降方向。核心思想是抽取少量候选扰动,评估其损失值,然后选择或组合那些与优化目标最一致的扰动。我们开发了该思想的两种实现:MeZO-GV,通过低损失与高损失扰动组的对比形成引导向量;以及MeZO-Greedy,在固定评估预算内保留单一最优扰动。我们理论上证明,两种策略在目标函数上每步的降低幅度均大于标准零阶估计,从而提升收敛速率。在不同规模和架构的LLMs上的实验证实,所提方法能自然地与现有零阶优化器集成,并持续提升收敛速度和任务精度。在OPT-13B上,我们的方法在11个基准测试中超越所有零阶基线,并在其中9个上超过基于梯度的方法,同时保持前向优化专属的内存效率。