MAP's not dead yet: Uncovering true language model modes by conditioning away degeneracy

It has been widely observed that exact or approximate MAP (mode-seeking) decoding from natural language generation (NLG) models consistently leads to degenerate outputs (Stahlberg and Byrne, 2019, Holtzman et al., 2019). This has generally been attributed to either a fundamental inadequacy of modes in models or weaknesses in language modeling. Contrastingly in this work, we emphasize that degenerate modes can even occur in the absence of any model error, due to contamination of the training data. Specifically, we show that mixing even a tiny amount of low-entropy noise with a population text distribution can cause the data distribution's mode to become degenerate, implying that any models trained on it will be as well. As the unconditional mode of NLG models will often be degenerate, we therefore propose to apply MAP decoding to the model's distribution conditional on avoiding specific degeneracies. Using exact-search, we empirically verify that the length-conditional modes of machine translation models and language models are indeed more fluent and topical than their unconditional modes. For the first time, we also share many examples of exact modal sequences from these models, and from several variants of the LLaMA-7B model. Notably, the modes of the LLaMA models are still degenerate, showing that improvements in modeling have not fixed this issue. Because of the cost of exact mode finding algorithms, we develop an approximate mode finding approach, ACBS, which finds sequences that are both high-likelihood and high-quality. We apply this approach to LLaMA-7B, a model which was not trained for instruction following, and find that we are able to elicit reasonable outputs without any finetuning.

翻译：已有广泛观察到，从自然语言生成（NLG）模型进行精确或近似MAP（模态搜索）解码时，始终会产生退化输出（Stahlberg and Byrne, 2019; Holtzman et al., 2019）。这一现象通常被归因于模型模态的根本缺陷或语言建模的弱点。然而，本文强调，即使在没有任何模型误差的情况下，由于训练数据的污染，退化模态也可能出现。具体而言，我们证明，即使将极小量的低熵噪声混入总体文本分布，也可能导致数据分布的模态退化为退化模态，这意味着基于该数据训练的任何模型也将产生退化模态。鉴于NLG模型的无条件模态通常退化，我们提出在条件化避免特定退化的情况下对模型分布应用MAP解码。通过精确搜索，我们实证验证了机器翻译模型和语言模型在长度条件化下的模态比其无条件模态更具流畅性和主题相关性。我们还首次分享了这些模型及LLaMA-7B模型多个变体的精确模态序列示例。值得注意的是，LLaMA模型的模态仍为退化模态，表明建模的改进并未解决这一问题。由于精确模态搜索算法成本高昂，我们开发了一种近似模态搜索方法ACBS，该方法能够找到既高似然又高质量的序列。我们将该方法应用于未经指令微调训练的LLaMA-7B模型，发现无需任何微调即可生成合理的输出。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日