This study investigates machine translation between related languages i.e., languages within the same family that share similar linguistic traits such as word order and lexical similarity. Machine translation through few-shot prompting leverages a small set of translation pair examples to generate translations for test sentences. This requires the model to learn how to generate translations while simultaneously ensuring that token ordering is maintained to produce a fluent and accurate translation. We propose that for related languages, the task of machine translation can be simplified by leveraging the monotonic alignment characteristic of such languages. We introduce a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations. Through evaluations conducted on multiple related language pairs across various language families, we demonstrate that our novel approach of decomposed prompting surpasses multiple established few-shot baseline models, thereby verifying its effectiveness. For example, our model outperforms the strong few-shot prompting BLOOM model with an average improvement of 4.2 chrF++ scores across the examined languages.
翻译:本研究探讨了相关语言之间的机器翻译,即属于同一语系且共享相似语言特征(如词序和词汇相似性)的语言。通过少样本提示的机器翻译利用少量翻译对示例为测试句子生成翻译,这要求模型在保持词序以确保翻译流畅准确的同时,学习如何生成翻译。我们提出,对于相关语言,机器翻译任务可通过利用这些语言的单调对齐特性得到简化。我们引入了一种新颖的少样本提示方法,将翻译过程分解为一系列词块翻译。通过在多个语系中多种相关语言对上的评估,我们证明了这种分解提示方法超越了多个已有的少样本基线模型,从而验证了其有效性。例如,我们的模型在考察的语言上平均比强大的少样本提示BLOOM模型高出4.2个chrF++分数。