Semi-parametric Nearest Neighbor Language Models ($k$NN-LMs) have produced impressive gains over purely parametric LMs, by leveraging large-scale neighborhood retrieval over external memory datastores. However, there has been little investigation into adapting such models for new domains. This work attempts to fill that gap and suggests the following approaches for adapting $k$NN-LMs -- 1) adapting the underlying LM (using Adapters), 2) expanding neighborhood retrieval over an additional adaptation datastore, and 3) adapting the weights (scores) of retrieved neighbors using a learned Rescorer module. We study each adaptation strategy separately, as well as the combined performance improvement through ablation experiments and an extensive set of evaluations run over seven adaptation domains. Our combined adaptation approach consistently outperforms purely parametric adaptation and zero-shot ($k$NN-LM) baselines that construct datastores from the adaptation data. On average, we see perplexity improvements of 17.1% and 16% for these respective baselines, across domains.
翻译:半参数最近邻语言模型($k$NN-LM)通过利用外部记忆数据存储的大规模邻域检索,在纯参数化LM基础上取得了显著的性能提升。然而,针对此类模型在新领域中的适配研究十分匮乏。本文尝试填补这一空白,提出以下三种$k$NN-LM适配方法:1)适配底层LM(使用适配器模块);2)通过额外适配数据存储扩展邻域检索范围;3)使用可学习的重打分模块调整检索邻居的权重(分数)。我们分别研究每种适配策略,并通过消融实验及在七个适配领域上的广泛评估分析组合方法的性能提升。与纯参数化适配及从适配数据构建数据存储的零样本($k$NN-LM)基线方法相比,我们的组合适配方法表现出持续优势。平均而言,在各领域上,该方法相较于上述两类基线分别实现了17.1%和16%的困惑度改进。