Neural Machine Translation (NMT) models for low-resource languages suffer significant performance degradation under domain shift. We quantify this challenge using Dhao, an indigenous language of Eastern Indonesia with no digital footprint beyond the New Testament (NT). When applied to the unseen Old Testament (OT), a standard NMT model fine-tuned on the NT drops from an in-domain score of 36.17 chrF++ to 27.11 chrF++. To recover this loss, we introduce a hybrid framework where a fine-tuned NMT model generates an initial draft, which is then refined by a Large Language Model (LLM) using Retrieval-Augmented Generation (RAG). The final system achieves 35.21 chrF++ (+8.10 recovery), effectively matching the original in-domain quality. Our analysis reveals that this performance is driven primarily by the number of retrieved examples rather than the choice of retrieval algorithm. Qualitative analysis confirms the LLM acts as a robust "safety net," repairing severe failures in zero-shot domains.
翻译:针对低资源语言的神经机器翻译(NMT)模型在面临领域偏移时,性能会出现显著下降。我们使用Dhao语(印度尼西亚东部的一种土著语言,除《新约》外无任何数字足迹)来量化这一挑战。当将基于《新约》微调的标准NMT模型应用于未见过的《旧约》时,其领域内得分从36.17 chrF++下降至27.11 chrF++。为弥补这一损失,我们引入了一种混合框架:首先由微调的NMT模型生成初始草稿,然后由大型语言模型(LLM)利用检索增强生成(RAG)技术对其进行精炼。最终系统达到了35.21 chrF++(恢复了+8.10),有效匹配了原有的领域内质量。我们的分析表明,该性能主要由检索到的示例数量驱动,而非检索算法的选择。定性分析证实,LLM充当了强大的“安全网”,修复了在零样本领域中出现的严重错误。