Large language models (LLMs) have demonstrated competitive performance in zero-shot multilingual machine translation (MT). Some follow-up works further improved MT performance via preference optimization, but they leave a key aspect largely underexplored: the order in which data samples are given during training. We address this topic by integrating curriculum learning into various state-of-the-art preference optimization algorithms to boost MT performance. We introduce a novel curriculum learning strategy with restarts (CLewR), which reiterates easy-to-hard curriculum multiple times during training to effectively mitigate the catastrophic forgetting of easy examples. We demonstrate consistent gains across several model families (Gemma2, Qwen2.5, Llama3.1) and preference optimization techniques. We publicly release our code at https://github.com/alexandra-dragomir/CLewR.
翻译:大型语言模型(LLMs)已在零样本多语言机器翻译(MT)中展现出具有竞争力的性能。后续研究通过偏好优化进一步提升了翻译质量,但一个关键方面仍未被充分探索:训练过程中数据样本的呈现顺序。我们通过将课程学习集成到多种先进偏好优化算法中,系统性地探究了这一课题,以提升机器翻译性能。我们提出了一种新颖的带重启课程学习策略(CLewR),该策略在训练过程中多次重复“从易到难”的课程安排,从而有效缓解对简单样例的灾难性遗忘问题。我们在多个模型族(Gemma2、Qwen2.5、Llama3.1)及偏好优化技术上验证了该方法的一致性性能提升。我们已公开代码,访问地址为:https://github.com/alexandra-dragomir/CLewR。