Neural machine translation (NMT) from Chinese to low-resource Southeast Asian languages remains severely constrained by the extreme scarcity of clean parallel corpora and the pervasive noise in existing mined data. This chronic shortage not only impedes effective model training but also sustains a large performance gap with high-resource directions, leaving millions of speakers of languages such as Lao, Burmese, and Tagalog with persistently low-quality translation systems despite recent advances in large multilingual models. We introduce \textbf{M}ultilingual \textbf{E}xpert-\textbf{R}eward \textbf{I}nformed \textbf{T}uning (\textbf{MERIT}), a unified translation framework that transforms the traditional English-centric ALT benchmark into a Chinese-centric evaluation suite for five Southeast Asian low-resource languages (LRLs). Our framework combines language-specific token prefixing (LTP) with supervised fine-tuning (SFT) and a novel group relative policy optimization (GRPO) guided by the semantic alignment reward (SAR). These results confirm that, in LRL{\textrightarrow}Chinese translation, targeted data curation and reward-guided optimization dramatically outperform mere model scaling.
翻译:神经机器翻译(NMT)从中文到东南亚低资源语言仍严重受制于干净平行语料的极度匮乏以及现有挖掘数据中普遍存在的噪声。这种长期短缺不仅阻碍了有效模型训练,更使得与高资源方向之间存在巨大性能差距——尽管近年大规模多语言模型取得进展,但老挝语、缅甸语、塔加洛语等语言的数百万用户仍只能使用持续低质量的翻译系统。我们提出**多语言专家奖励引导调优**(MERIT),这是一个统一的翻译框架,将传统的以英语为中心的ALT基准测试转化为以中文为中心的评估套件,覆盖五种东南亚低资源语言(LRLs)。该框架结合了语言特定标记前缀(LTP)、监督微调(SFT)以及基于语义对齐奖励(SAR)的新型组相对策略优化(GRPO)。实验结果证实,在低资源语言→中文的翻译中,定向数据整理与奖励引导优化远胜于单纯的模型规模扩展。