Mitigating the Problem of Strong Priors in LMs with Context Extrapolation

Language models (LMs) have become important tools in a variety of applications, from data processing to the creation of instruction-following assistants. But despite their advantages, LMs have certain idiosyncratic limitations such as the problem of `strong priors', where a model learns to output typical continuations in response to certain, usually local, portions of the input regardless of any earlier instructions. For example, prompt injection attacks can induce models to ignore explicit directives. In some cases, larger models have been shown to be more susceptible to these problems than similar smaller models, an example of the phenomenon of `inverse scaling'. We develop a new technique for mitigating the problem of strong priors: we take the original set of instructions, produce a weakened version of the original prompt that is even more susceptible to the strong priors problem, and then extrapolate the continuation away from the weakened prompt. This lets us infer how the model would continue a hypothetical strengthened set of instructions. Our technique conceptualises LMs as mixture models which combine a family of data generation processes, reinforcing the desired elements of the mixture. Our approach works at inference time, removing any need for retraining. We apply it to eleven models including GPT-2, GPT-3, Llama 2, and Mistral on four tasks, and find improvements in 41/44. Across all 44 combinations the median increase in proportion of tasks completed is 40%.

翻译：语言模型（LM）已成为从数据处理到指令跟随助手创建等多种应用中的重要工具。然而，尽管存在这些优势，LM仍具有某些独特的局限性，例如"强先验"问题——模型会学习针对输入中特定（通常为局部）部分输出典型续写，而忽略更早的指令。例如，提示注入攻击可能诱导模型忽略明确指令。在某些情况下，较大模型比同类较小模型更容易出现此类问题，这体现了"逆缩放"现象。我们开发了一种缓解强先验问题的新技术：首先获取原始指令集，生成原始提示的弱化版本（该版本更容易出现强先验问题），然后通过外推的方式将续写结果偏离弱化提示。这使我们能够推断模型在假设的增强指令集下的续写方式。我们的技术将LM概念化为混合模型，该模型结合了一系列数据生成过程，从而强化混合过程中的期望元素。该方法在推理阶段运行，无需重新训练。我们在GPT-2、GPT-3、Llama 2和Mistral等11个模型上对四个任务进行了测试，在41/44个组合中观察到改进。在所有44个组合中，任务完成比例的中位数提升为40%。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日