Traditionally, success in multilingual machine translation can be attributed to three key factors in training data: large volume, diverse translation directions, and high quality. In the current practice of fine-tuning large language models (LLMs) for translation, we revisit the importance of all these factors. We find that LLMs display strong translation capability after being fine-tuned on as few as 32 training instances, and that fine-tuning on a single translation direction effectively enables LLMs to translate in multiple directions. However, the choice of direction is critical: fine-tuning LLMs with English on the target side can lead to task misinterpretation, which hinders translations into non-English languages. A similar problem arises when noise is introduced into the target side of parallel data, especially when the target language is well-represented in the LLM's pre-training. In contrast, noise in an under-represented language has a less pronounced effect. Our findings suggest that attaining successful alignment hinges on teaching the model to maintain a "superficial" focus, thereby avoiding the learning of erroneous biases beyond translation.
翻译:传统上,多语言机器翻译的成功可归因于训练数据的三个关键因素:大规模、多样化的翻译方向和高品质。在当前通过微调大语言模型(LLMs)进行翻译的实践中,我们重新审视了所有这些因素的重要性。我们发现,在仅用32个训练实例进行微调后,LLMs即展现出强大的翻译能力,且针对单一翻译方向的微调能有效使LLMs具备多方向翻译能力。然而,方向选择至关重要:将英语作为目标语言微调LLMs可能导致任务误解读,从而阻碍向非英语语言的翻译。当平行数据的目标语言侧引入噪声时,类似问题也会出现,尤其是在目标语言在LLM预训练中已充分表征的情况下。相比之下,在低表征语言中引入噪声的影响则不那么显著。我们的研究结果表明:实现成功对齐的关键在于教导模型维持“表层”聚焦,从而避免在翻译之外学习到错误偏差。