Nearest neighbor machine translation augments the Autoregressive Translation~(AT) with $k$-nearest-neighbor retrieval, by comparing the similarity between the token-level context representations of the target tokens in the query and the datastore. However, the token-level representation may introduce noise when translating ambiguous words, or fail to provide accurate retrieval results when the representation generated by the model contains indistinguishable context information, e.g., Non-Autoregressive Translation~(NAT) models. In this paper, we propose a novel $n$-gram nearest neighbor retrieval method that is model agnostic and applicable to both AT and NAT models. Specifically, we concatenate the adjacent $n$-gram hidden representations as the key, while the tuple of corresponding target tokens is the value. In inference, we propose tailored decoding algorithms for AT and NAT models respectively. We demonstrate that the proposed method consistently outperforms the token-level method on both AT and NAT models as well on general as on domain adaptation translation tasks. On domain adaptation, the proposed method brings $1.03$ and $2.76$ improvements regarding the average BLEU score on AT and NAT models respectively.
翻译:最近邻机器翻译通过将查询中目标令牌的令牌级上下文表示与数据存储之间的相似性进行比较,增强自回归翻译(AT)的$k$-最近邻检索能力。然而,当翻译歧义词时,令牌级表示可能引入噪声,或者在模型生成包含无法区分的上下文信息(例如非自回归翻译(NAT)模型)时,无法提供准确的检索结果。本文提出一种新颖的$n$元语法最近邻检索方法,该方法与模型无关,适用于AT和NAT模型。具体而言,我们将相邻的$n$元语法隐藏表示拼接作为键,而相应目标令牌元组作为值。在推理过程中,我们分别为AT和NAT模型设计了定制化的解码算法。实验证明,所提方法在一般翻译任务和领域适应翻译任务上,均持续优于令牌级方法。在领域适应任务中,所提方法使AT和NAT模型的平均BLEU得分分别提升$1.03$和$2.76$。