Building machine translation (MT) systems for low-resource languages is notably difficult due to the scarcity of high-quality data. Although Large Language Models (LLMs) have improved MT system performance, adapting them to lesser-represented languages remains challenging. In-context learning (ICL) may offer novel ways to adapt LLMs for low-resource MT by conditioning models on demonstration at inference time. In this study, we explore scaling low-resource machine translation ICL beyond the few-shot setting to thousands of examples with long-context models. We scale in-context token budget to 1M tokens and compare three types of training corpora used as in-context supervision: monolingual unsupervised data, instruction-style data, and parallel data (English--target and Indonesian--target). Our experiments on Javanese and Sundanese show that gains from additional context saturate quickly and can degrade near the maximum context window, with scaling behavior strongly dependent on corpus type. Notably, some forms of monolingual supervision can be competitive with parallel data, despite the latter offering additional supervision. Overall, our results characterize the effective limits and corpus-type sensitivity of long-context ICL for low-resource MT, highlighting that larger context windows do not necessarily yield proportional quality gains.
翻译:构建低资源语言的机器翻译系统因高质量数据稀缺而尤为困难。尽管大型语言模型提升了机器翻译系统的性能,但将其适配于代表性不足的语言仍具挑战性。上下文学习通过在推理时以演示示例调节模型,可能为低资源机器翻译提供适配大型语言模型的新途径。本研究探索将低资源机器翻译的上下文学习从少样本设定扩展至长上下文模型下的数千个示例。我们将上下文标记预算扩展至100万标记,并比较三种用作上下文监督的训练语料:单语无监督数据、指令式数据以及平行数据(英语-目标语言和印尼语-目标语言)。在爪哇语和巽他语上的实验表明,增加上下文带来的收益会快速饱和,并在接近最大上下文窗口时可能产生性能下降,其扩展行为高度依赖于语料类型。值得注意的是,某些形式的单语监督可与平行数据相媲美,尽管后者提供了额外的监督信息。总体而言,我们的结果揭示了长上下文上下文学习在低资源机器翻译中的有效极限与语料类型敏感性,表明更大的上下文窗口未必带来成比例的质量提升。