基于自适应贝叶斯子空间的零阶大语言模型高效鲁棒微调优化器 (Robust and Efficient Zeroth-Order LLM Fine-Tuning via Adaptive Bayesian Subspace Optimizer)

Fine-tuning large language models (LLMs) with zeroth-order (ZO) optimization reduces memory by approximating gradients through function evaluations. However, existing methods essentially perform updates in a one-dimensional space, and suffer from collapse or substantial performance degradation under low-precision training. We introduce BSZO, an adaptive \textbf{B}ayesian \textbf{S}ubspace \textbf{Z}eroth-Order \textbf{O}ptimizer, which applies Kalman filtering to combine finite-difference information across multiple perturbation directions within a subspace. By treating each finite-difference measurement as a noisy observation, BSZO builds a posterior distribution over the subspace-projected gradient and updates it through Bayesian inference, with a residual-based adaptive mechanism to adapt to noise variations. Theoretical analysis shows that BSZO improves the convergence rate by a factor of $k/γ$ compared to standard ZO methods. Experiments on RoBERTa, Mistral, and OPT models show that BSZO outperforms the baselines across various tasks, achieving up to 6.67\% absolute average improvement on OPT-13B while remaining robust under fp16/bf16 precision and keeping memory usage close to inference-only baselines (1.00$\times$--1.08$\times$ of MeZO).

翻译：采用零阶优化方法对大语言模型进行微调，通过函数评估近似梯度以降低内存开销。然而，现有方法本质上是在一维空间中进行更新，在低精度训练下易出现崩溃或性能显著下降。本文提出BSZO，一种自适应贝叶斯子空间零阶优化器，其利用卡尔曼滤波融合子空间内多个扰动方向的有限差分信息。通过将每个有限差分测量视为带噪声的观测，BSZO构建子空间投影梯度的后验分布，并借助贝叶斯推断进行更新，同时采用基于残差的自适应机制以适应噪声变化。理论分析表明，相比标准零阶方法，BSZO将收敛速率提升了$k/γ$倍。在RoBERTa、Mistral和OPT模型上的实验表明，BSZO在多项任务中均优于基线方法，在OPT-13B上实现了最高6.67%的绝对平均提升，同时在fp16/bf16精度下保持鲁棒性，且内存使用量接近纯推理基线（为MeZO的1.00$\times$--1.08$\times$）。