反射式翻译：通过结构化自反思改进低资源机器翻译 (Reflective Translation: Improving Low-Resource Machine Translation via Structured Self-Reflection)

from arxiv, 12 pages, 3 figures, 6 tables. Accepted to the NeurIPS 2025 Workshop on Multilingual Representation Learning (Mexico City) and the AAAI 2025 Workshop on Language Models for Under-Resourced Communities (LM4UC). Code and data available at: https://github.com/Nickcheng123/reflective-translation-mt

Low-resource languages such as isiZulu and isiXhosa face persistent challenges in machine translation due to limited parallel data and linguistic resources. Recent advances in large language models suggest that self-reflection, prompting a model to critique and revise its own outputs, can improve reasoning quality and factual consistency. Building on this idea, this paper introduces Reflective Translation, a prompt-based framework in which a model generates an initial translation, produces a structured self-critique, and then uses this reflection to generate a refined translation. The approach is evaluated on English-isiZulu and English-isiXhosa translation using OPUS-100 and NTREX-African, across multiple prompting strategies and confidence thresholds. Results show consistent improvements in both BLEU and COMET scores between first- and second-pass translations, with average gains of up to +0.22 BLEU and +0.18 COMET. Statistical significance testing using paired nonparametric tests confirms that these improvements are robust. The proposed method is model-agnostic, requires no fine-tuning, and introduces a reflection-augmented dataset that can support future supervised or analysis-driven work. These findings demonstrate that structured self-reflection is a practical and effective mechanism for improving translation quality in low-resource settings.

翻译：祖鲁语和科萨语等低资源语言由于平行数据和语言资源有限，在机器翻译领域持续面临挑战。大型语言模型的最新进展表明，自反思——即促使模型对其自身输出进行批判与修订——能够提升推理质量和事实一致性。基于这一思想，本文提出反射式翻译，一种基于提示的框架：模型首先生成初始翻译，随后产生结构化自批判，并利用该反思生成优化后的翻译。该方法在英语-祖鲁语和英语-科萨语翻译任务上进行了评估，使用OPUS-100和NTREX-African数据集，涵盖多种提示策略和置信度阈值。实验结果显示，在首轮与次轮翻译之间，BLEU和COMET分数均获得持续提升，平均增益最高达+0.22 BLEU和+0.18 COMET。通过配对非参数检验进行的统计显著性分析证实了这些改进的稳健性。所提出的方法具有模型无关性，无需微调，并构建了一个反射增强数据集，可支持未来基于监督或分析的研究工作。这些发现表明，结构化自反思是提升低资源场景下翻译质量的一种实用且有效的机制。