Nearest Neighbor Machine Translation is Meta-Optimizer on Output Projection Layer

Nearest Neighbor Machine Translation ($k$NN-MT) has achieved great success in domain adaptation tasks by integrating pre-trained Neural Machine Translation (NMT) models with domain-specific token-level retrieval. However, the reasons underlying its success have not been thoroughly investigated. In this paper, we comprehensively analyze $k$NN-MT through theoretical and empirical studies. Initially, we provide new insights into the working mechanism of $k$NN-MT as an efficient technique to implicitly execute gradient descent on the output projection layer of NMT, indicating that it is a specific case of model fine-tuning. Subsequently, we conduct multi-domain experiments and word-level analysis to examine the differences in performance between $k$NN-MT and entire-model fine-tuning. Our findings suggest that: (1) Incorporating $k$NN-MT with adapters yields comparable translation performance to fine-tuning on in-domain test sets, while achieving better performance on out-of-domain test sets; (2) Fine-tuning significantly outperforms $k$NN-MT on the recall of in-domain low-frequency words, but this gap could be bridged by optimizing the context representations with additional adapter layers.

翻译：最近邻机器翻译（$k$NN-MT）通过将预训练的神经机器翻译（NMT）模型与领域特定的词级检索相结合，在领域自适应任务中取得了巨大成功。然而，其成功背后的原因尚未得到深入研究。本文通过理论与实证研究，全面分析了$k$NN-MT。首先，我们揭示了$k$NN-MT工作机制的新见解：它是一种高效隐式执行NMT输出投影层梯度下降的技术，表明其是模型微调的一种特例。随后，我们进行了多领域实验和词级分析，以检验$k$NN-MT与全模型微调在性能上的差异。我们的发现表明：（1）将$k$NN-MT与适配器结合使用时，在领域内测试集上可获得与微调相当的翻译性能，同时在领域外测试集上表现更优；（2）微调在领域内低频词的召回率上显著优于$k$NN-MT，但通过使用额外的适配器层优化上下文表示，这一差距可以缩小。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日