Naive joint training of large language models (LLMs) for multilingual preference alignment can suffer from negative interference. This is a known issue in multilingual training, where conflicting objectives degrade overall performance. However, the impact of this phenomenon in the context of multilingual preference alignment remains largely underexplored. To address this issue, we propose CONGRAD, a scalable and effective filtering method that selects high-quality preference samples with minimal gradient conflicts across languages. Our method leverages gradient surgery to retain samples aligned with an aggregated multilingual update direction. Additionally, we incorporate a sublinear gradient compression strategy that reduces memory overhead during gradient accumulation. We integrate CONGRAD into self-rewarding framework and evaluate on LLaMA3-8B and Gemma2-2B across 10 languages. Results show that CONGRAD consistently outperforms strong baselines in both seen and unseen languages, with minimal alignment tax.
翻译:大型语言模型(LLM)在多语言偏好对齐中的简单联合训练容易受到负向干扰的影响。这是多语言训练中的一个已知问题,即相互冲突的优化目标会降低整体性能。然而,这一现象在多语言偏好对齐背景下的影响仍未得到充分探索。为解决该问题,我们提出了CONGRAD——一种可扩展且高效的过滤方法,该方法能够选择跨语言梯度冲突最小的高质量偏好样本。我们的方法利用梯度手术技术来保留与聚合多语言更新方向一致的样本。此外,我们引入了亚线性梯度压缩策略,以降低梯度累积过程中的内存开销。我们将CONGRAD集成到自奖励框架中,并在LLaMA3-8B和Gemma2-2B模型上对10种语言进行了评估。实验结果表明,CONGRAD在已见语言和未见语言中均持续优于现有基线方法,且仅产生极小的对齐代价。