Fair-GPTQ: Bias-Aware Quantization for Large Language Models

High memory demands of generative language models have drawn attention to quantization, which reduces computational cost, memory usage, and latency by mapping model weights to lower-precision integers. Approaches such as GPTQ effectively minimize input-weight product errors during quantization; however, recent empirical studies show that they can increase biased outputs and degrade performance on fairness benchmarks, and it remains unclear which specific weights cause this issue. In this work, we draw new links between quantization and model fairness by adding explicit group-fairness constraints to the quantization objective and introduce Fair-GPTQ, the first quantization method explicitly designed to reduce unfairness in large language models. The added constraints guide the learning of the rounding operation toward less-biased text generation for protected groups. Specifically, we focus on stereotype generation involving occupational bias and discriminatory language spanning gender, race, and religion. Fair-GPTQ has minimal impact on performance, preserving at least 90% of baseline accuracy on zero-shot benchmarks, reduces unfairness relative to a half-precision model, and retains the memory and speed benefits of 4-bit quantization. We also compare the performance of Fair-GPTQ with existing debiasing methods and find that it achieves performance on par with the iterative null-space projection debiasing approach on racial-stereotype benchmarks. Overall, the results validate our theoretical solution to the quantization problem with a group-bias term, highlight its applicability for reducing group bias at quantization time in generative models, and demonstrate that our approach can further be used to analyze channel- and weight-level contributions to fairness during quantization.

翻译：生成式语言模型的高内存需求使量化技术备受关注，该技术通过将模型权重映射至低精度整数来降低计算成本、内存占用与延迟。诸如GPTQ等方法在量化过程中能有效最小化输入-权重乘积误差；然而，近期实证研究表明，此类方法可能加剧输出偏差并降低在公平性基准测试中的性能，且尚不明确具体哪些权重导致了该问题。本研究通过向量化目标函数添加显式的群体公平约束，建立了量化与模型公平性的新关联，并提出了Fair-GPTQ——首个为降低大语言模型不公平性而显式设计的量化方法。所添加的约束引导舍入操作的学习过程，使其为受保护群体生成偏差更小的文本。具体而言，我们聚焦于涉及职业偏见的刻板印象生成，以及跨越性别、种族与宗教的歧视性语言表达。Fair-GPTQ对模型性能影响极小，在零样本基准测试中保持至少90%的基线准确率，相较于半精度模型降低了不公平性，同时保留了4位量化的内存与速度优势。我们还将Fair-GPTQ与现有去偏方法进行性能对比，发现在种族刻板印象基准测试中，其性能与迭代零空间投影去偏方法相当。总体而言，实验结果验证了我们提出的带群体偏差项的量化理论解决方案，凸显了该方法在生成模型量化阶段降低群体偏差的适用性，并证明我们的方法可进一步用于分析量化过程中通道与权重层级对公平性的贡献。