Prompting Large Language Models with Human Error Markings for Self-Correcting Machine Translation

While large language models (LLMs) pre-trained on massive amounts of unpaired language data have reached the state-of-the-art in machine translation (MT) of general domain texts, post-editing (PE) is still required to correct errors and to enhance term translation quality in specialized domains. In this paper we present a pilot study of enhancing translation memories (TM) produced by PE (source segments, machine translations, and reference translations, henceforth called PE-TM) for the needs of correct and consistent term translation in technical domains. We investigate a light-weight two-step scenario where, at inference time, a human translator marks errors in the first translation step, and in a second step a few similar examples are extracted from the PE-TM to prompt an LLM. Our experiment shows that the additional effort of augmenting translations with human error markings guides the LLM to focus on a correction of the marked errors, yielding consistent improvements over automatic PE (APE) and MT from scratch.

翻译：尽管在大规模无配对语言数据上预训练的大语言模型（LLM）已在通用领域文本的机器翻译（MT）中达到最先进水平，但在专业领域中仍需通过译后编辑（PE）来修正错误并提升术语翻译质量。本文提出一项增强型翻译记忆库（TM）的初步研究，该记忆库由译后编辑生成（包含源语句段、机器翻译结果及参考译文，下文称为PE-TM），旨在满足技术领域对术语翻译准确性与一致性的需求。我们探索一种轻量级的两阶段方案：在推理阶段，人工译员首先对初译结果进行错误标注；随后从PE-TM中抽取若干相似示例作为提示输入LLM。实验表明，通过人工错误标注增强翻译结果能引导LLM聚焦于标注错误的修正，相比自动译后编辑（APE）和从零开始的机器翻译均取得了持续性的改进。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日