This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages. Focusing on two closely related language pairs, English-Marathi and English-Hindi, we exploit the linguistic similarities to develop a robust multilingual APE model. To facilitate cross-linguistic transfer, we generate synthetic Hindi-Marathi and Marathi-Hindi APE triplets. Additionally, we incorporate a Quality Estimation (QE)-APE multi-task learning framework. While the experimental results underline the complementary nature of APE and QE, we also observe that QE-APE multitask learning facilitates effective domain adaptation. Our experiments demonstrate that the multilingual APE models outperform their corresponding English-Hindi and English-Marathi single-pair models by $2.5$ and $2.39$ TER points, respectively, with further notable improvements over the multilingual APE model observed through multi-task learning ($+1.29$ and $+1.44$ TER points), data augmentation ($+0.53$ and $+0.45$ TER points) and domain adaptation ($+0.35$ and $+0.45$ TER points). We release the synthetic data, code, and models accrued during this study publicly at https://github.com/cfiltnlp/Multilingual-APE.
翻译:本探索性研究旨在探究多语言自动译后编辑(APE)系统提升低资源印度-雅利安语系语言机器翻译质量的潜力。研究聚焦英语-马拉地语和英语-印地语这两组密切相关的语言对,通过利用其语言相似性构建了鲁棒的多语言APE模型。为促进跨语言迁移,我们生成了印地语-马拉地语和马拉地语-印地语的合成APE三元组。此外,我们引入了质量评估(QE)与APE的多任务学习框架。实验结果在揭示APE与QE互补性的同时,也表明QE-APE多任务学习能有效促进领域适应。实验表明:多语言APE模型相较于对应的英语-印地语和英语-马拉地语单语言对模型,TER指标分别提升$2.5$和$2.39$个点;而通过多任务学习(分别提升$+1.29$和$+1.44$个TER点)、数据增强(分别提升$+0.53$和$+0.45$个TER点)及领域适应(分别提升$+0.35$和$+0.45$个TER点)策略,多语言APE模型获得了更显著的性能提升。本研究积累的合成数据、代码与模型已公开发布于https://github.com/cfiltnlp/Multilingual-APE。