Low-precision fine-tuning of language models has gained prominence as a cost-effective and energy-efficient approach to deploying large-scale models in various applications. However, this approach is susceptible to the existence of outlier values in activation. The outlier values in the activation can negatively affect the performance of fine-tuning language models in the low-precision regime since they affect the scaling factor and thus make representing smaller values harder. This paper investigates techniques for mitigating outlier activation in low-precision integer fine-tuning of the language models. Our proposed novel approach enables us to represent the outlier activation values in 8-bit integers instead of floating-point (FP16) values. The benefit of using integers for outlier values is that it enables us to use operator tiling to avoid performing 16-bit integer matrix multiplication to address this problem effectively. We provide theoretical analysis and supporting experiments to demonstrate the effectiveness of our approach in improving the robustness and performance of low-precision fine-tuned language models.
翻译:低精度微调语言模型作为一种在各类应用中部署大规模模型的经济高效且节能的方法,已日益受到关注。然而,该方法易受激活值中异常值的影响。这些激活异常值会降低低精度微调语言模型的性能,因其影响缩放因子,导致难以有效表征较小的数值。本文探究了在语言模型低精度整数微调过程中缓解激活异常的技术。我们提出的新方法能够将异常激活值以8位整数而非浮点(FP16)形式表示。使用整数表示异常值的优势在于,可通过算子分块避免执行16位整数矩阵乘法,从而有效解决该问题。我们通过理论分析与实验验证,证明了该方法在提升低精度微调语言模型鲁棒性与性能方面的有效性。