Quantization-aware training (QAT) simulates a quantization process during training to lower bit-precision of weights/activations. It learns quantized weights indirectly by updating latent weights, i.e., full-precision inputs to a quantizer, using gradient-based optimizers. We claim that coupling a user-defined learning rate (LR) with these optimizers is sub-optimal for QAT. Quantized weights transit discrete levels of a quantizer, only if corresponding latent weights pass transition points, where the quantizer changes discrete states. This suggests that the changes of quantized weights are affected by both the LR for latent weights and their distributions. It is thus difficult to control the degree of changes for quantized weights by scheduling the LR manually. We conjecture that the degree of parameter changes in QAT is related to the number of quantized weights transiting discrete levels. Based on this, we introduce a transition rate (TR) scheduling technique that controls the number of transitions of quantized weights explicitly. Instead of scheduling a LR for latent weights, we schedule a target TR of quantized weights, and update the latent weights with a novel transition-adaptive LR (TALR), enabling considering the degree of changes for the quantized weights during QAT. Experimental results demonstrate the effectiveness of our approach on standard benchmarks.
翻译:量化感知训练(QAT)通过在训练过程中模拟量化过程来降低权重/激活的比特精度。它通过使用基于梯度的优化器更新潜在权重(即量化器的全精度输入),间接学习量化权重。我们认为,将用户定义的学习率(LR)与这些优化器耦合对于QAT而言并非最优。量化权重仅在相应潜在权重通过转移点(即量化器改变离散状态的位置)时,才会跨越量化器的离散层级。这表明量化权重的变化同时受到潜在权重的学习率及其分布的影响。因此,通过手动调度学习率来控制量化权重的变化程度十分困难。我们推测,QAT中参数变化的程度与跨越离散层级的量化权重数量相关。基于此,我们提出了一种转移率(TR)调度技术,该技术能够显式控制量化权重的转移次数。我们不再为潜在权重调度学习率,而是为量化权重调度目标转移率,并使用一种新颖的转移自适应学习率(TALR)更新潜在权重,从而在QAT过程中充分考虑量化权重的变化程度。实验结果在标准基准测试中验证了我们方法的有效性。