Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and continuous weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient estimation. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision weight updating signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning methods on a variety of tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .
翻译:后训练量化(PTQ)对于在内存受限设备上部署大语言模型(LLM)至关重要,但这也使得模型变得静态且难以微调。标准的微调范式(包括强化学习(RL))从根本上依赖于反向传播和连续权重来计算梯度。因此,它们无法应用于参数空间离散且不可微的量化模型。虽然进化策略(ES)提供了一种无需反向传播的替代方案,但由于梯度估计消失或不准确,对量化参数的优化仍可能失败。本文提出量化进化策略(QES),这是一种直接在量化空间中进行全参数微调的优化范式。QES基于两项创新:(1)它集成了累积误差反馈以保留高精度权重更新信号;(2)它利用无状态种子重放将内存使用量降低至低精度推理水平。QES在多种任务上显著优于最先进的零阶微调方法,使得直接微调量化模型成为可能。这为完全在量化空间中扩展大语言模型开辟了可能性。源代码可在 https://github.com/dibbla/Quantized-Evolution-Strategies 获取。