Improving Word Sense Disambiguation in Neural Machine Translation with Salient Document Context

Lexical ambiguity is a challenging and pervasive problem in machine translation (\mt). We introduce a simple and scalable approach to resolve translation ambiguity by incorporating a small amount of extra-sentential context in neural \mt. Our approach requires no sense annotation and no change to standard model architectures. Since actual document context is not available for the vast majority of \mt training data, we collect related sentences for each input to construct pseudo-documents. Salient words from pseudo-documents are then encoded as a prefix to each source sentence to condition the generation of the translation. To evaluate, we release \docmucow, a challenge set for translation disambiguation based on the English-German \mucow \cite{raganato-etal-2020-evaluation} augmented with document IDs. Extensive experiments show that our method translates ambiguous source words better than strong sentence-level baselines and comparable document-level baselines while reducing training costs.

翻译：词汇歧义是机器翻译（\mt）中一个具有挑战性且普遍存在的问题。我们提出了一种简单且可扩展的方法，通过在神经\mt中引入少量句子间上下文来解决翻译歧义。该方法无需词义标注，也不需要改变标准模型架构。由于大多数\mt训练数据缺乏实际文档上下文，我们为每个输入收集相关句子以构建伪文档。随后，将伪文档中的显著词汇编码为每个源句的前缀，以约束翻译的生成过程。为评估效果，我们发布了\docmucow数据集——基于英语-德语\mucow \cite{raganato-etal-2020-evaluation}并补充文档ID的翻译消歧挑战集。大量实验表明，我们的方法在翻译歧义词时优于强句子级基线，与文档级基线性能相当，同时降低了训练成本。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日