This paper explores the impact of different back-translation approaches on machine translation for Ladin, specifically the Val Badia variant. Given the limited amount of parallel data available for this language (only 18k Ladin-Italian sentence pairs), we investigate the performance of a multilingual neural machine translation model fine-tuned for Ladin-Italian. In addition to the available authentic data, we synthesise further translations by using three different models: a fine-tuned neural model, a rule-based system developed specifically for this language pair, and a large language model. Our experiments show that all approaches achieve comparable translation quality in this low-resource scenario, yet round-trip translations highlight differences in model performance.
翻译:本文探讨了不同反向翻译方法对拉定语(特别是Val Badia变体)机器翻译的影响。鉴于该语言可用的平行数据有限(仅含1.8万句拉定语-意大利语句对),我们研究了专为拉定语-意大利语微调的多语言神经机器翻译模型的性能。除现有真实数据外,我们通过三种不同模型合成更多翻译:微调神经模型、专门为此语言对开发的基于规则系统,以及大型语言模型。实验表明,在低资源场景下所有方法均能达到相当的翻译质量,但循环翻译结果凸显了模型性能的差异性。