While large language models (LLMs) pre-trained on massive amounts of unpaired language data have reached the state-of-the-art in machine translation (MT) of general domain texts, post-editing (PE) is still required to correct errors and to enhance term translation quality in specialized domains. In this paper we present a pilot study of enhancing translation memories (TM) produced by PE (source segments, machine translations, and reference translations, henceforth called PE-TM) for the needs of correct and consistent term translation in technical domains. We investigate a light-weight two-step scenario where, at inference time, a human translator marks errors in the first translation step, and in a second step a few similar examples are extracted from the PE-TM to prompt an LLM. Our experiment shows that the additional effort of augmenting translations with human error markings guides the LLM to focus on a correction of the marked errors, yielding consistent improvements over automatic PE (APE) and MT from scratch.
翻译:尽管在大规模无配对语言数据上预训练的大语言模型(LLM)已在通用领域文本的机器翻译(MT)中达到最先进水平,但在专业领域中仍需通过译后编辑(PE)来修正错误并提升术语翻译质量。本文提出一项增强型翻译记忆库(TM)的初步研究,该记忆库由译后编辑生成(包含源语句段、机器翻译结果及参考译文,下文称为PE-TM),旨在满足技术领域对术语翻译准确性与一致性的需求。我们探索一种轻量级的两阶段方案:在推理阶段,人工译员首先对初译结果进行错误标注;随后从PE-TM中抽取若干相似示例作为提示输入LLM。实验表明,通过人工错误标注增强翻译结果能引导LLM聚焦于标注错误的修正,相比自动译后编辑(APE)和从零开始的机器翻译均取得了持续性的改进。