Large language models (LLMs) have demonstrated competitive performance in zero-shot multilingual machine translation (MT). Some follow-up works further improved MT performance via preference optimization, but they leave a key aspect largely underexplored: the order in which data samples are given during training. We address this topic by integrating curriculum learning into various state-of-the-art preference optimization algorithms to boost MT performance. We introduce a novel curriculum learning strategy with restarts (CLewR), which reiterates easy-to-hard curriculum multiple times during training to effectively mitigate the catastrophic forgetting of easy examples. We demonstrate consistent gains across several model families (Gemma2, Qwen2.5, Llama3.1) and preference optimization techniques. We publicly release our code at https://github.com/alexandra-dragomir/CLewR.
翻译:大型语言模型(LLMs)在零样本多语言机器翻译(MT)任务中已展现出具有竞争力的性能。部分后续研究通过偏好优化进一步提升了机器翻译性能,但这些工作普遍忽略了一个关键因素:训练过程中数据样本的呈现顺序。本文通过将课程学习整合到多种先进的偏好优化算法中,以提升机器翻译性能。我们提出了一种新颖的带重启机制的课程学习策略(CLewR),该策略在训练过程中多次重复“从易到难”的课程安排,从而有效缓解对简单样本的灾难性遗忘问题。我们在多个模型系列(Gemma2、Qwen2.5、Llama3.1)及不同偏好优化技术上均验证了性能的持续提升。代码已公开于 https://github.com/alexandra-dragomir/CLewR。