Rewriting source text with large language models (LLMs) before translation has been shown to improve machine translation (MT) quality. However, we find that prompt-based rewriting can degrade translation quality rather than improve it, particularly when smaller LLMs, such as 4B-parameter models, are used. We argue that this limitation stems from the difficulty of controlling rewriting behavior through natural-language prompts alone: a rewrite is useful only if it improves downstream translation, yet existing prompt-based methods do not explicitly optimize for this signal. To address this issue, we propose RLSR (Reinforcement Learning for Source Rewriting), a reinforcement learning framework that trains the rewriting model with a reward based on the downstream translation-quality improvement produced by each rewrite. Experiments across six MT systems and 16 language pairs show that our 4B RLSR-trained rewriting models significantly outperform both the no-rewriting baseline and prompt-based rewriting baselines at the same model scale, while remaining competitive with baselines that use a 235B LLM.
翻译:使用大语言模型(LLM)在翻译前对源文本进行重写已被证明能提升机器翻译(MT)质量。然而,我们发现,基于提示的重写反而可能降低翻译质量,尤其在采用参数量为4B的小型LLM时。我们认为这一局限源于仅通过自然语言提示难以控制重写行为:重写只有在提升下游翻译质量时才是有用的,而现有基于提示的方法并未明确针对这一信号进行优化。为解决此问题,我们提出RLSR(基于强化学习的源文本重写框架),该框架利用下游翻译质量的改进作为奖励信号,对重写模型进行强化学习训练。在六种MT系统和16种语言对上的实验表明,我们使用4B参数训练的RLSR重写模型,在相同模型规模下显著优于无重写基线及基于提示的重写基线,同时与使用235B参数LLM的基线方法保持竞争力。