Nearest neighbor machine translation augments the Autoregressive Translation~(AT) with $k$-nearest-neighbor retrieval, by comparing the similarity between the token-level context representations of the target tokens in the query and the datastore. However, the token-level representation may introduce noise when translating ambiguous words, or fail to provide accurate retrieval results when the representation generated by the model contains indistinguishable context information, e.g., Non-Autoregressive Translation~(NAT) models. In this paper, we propose a novel $n$-gram nearest neighbor retrieval method that is model agnostic and applicable to both AT and NAT models. Specifically, we concatenate the adjacent $n$-gram hidden representations as the key, while the tuple of corresponding target tokens is the value. In inference, we propose tailored decoding algorithms for AT and NAT models respectively. We demonstrate that the proposed method consistently outperforms the token-level method on both AT and NAT models as well on general as on domain adaptation translation tasks. On domain adaptation, the proposed method brings $1.03$ and $2.76$ improvements regarding the average BLEU score on AT and NAT models respectively.
翻译:最近邻机器翻译通过比较查询中目标词元的词级上下文表示与数据存储之间的相似度,利用$k$最近邻检索增强自回归翻译(AT)。然而,词级表示在翻译歧义词时可能引入噪声,或在模型生成的表示包含难以区分的上下文信息(如非自回归翻译模型)时无法提供准确的检索结果。本文提出了一种新颖的$n$元组最近邻检索方法,该方法与模型无关,适用于AT和NAT模型。具体而言,我们将相邻的$n$元组隐藏表示拼接作为键,对应的目标词元元组作为值。在推理阶段,我们分别为AT和NAT模型设计了定制化解码算法。实验表明,所提方法在AT和NAT模型上的通用及领域自适应翻译任务中均一致优于词级方法。在领域自适应任务中,该方法使AT和NAT模型的平均BLEU分数分别提升了$1.03$和$2.76$。