Prompting is one of the main ways to adapt a pretrained model to target tasks. Besides manually constructing prompts, many prompt optimization methods have been proposed in the literature. Method development is mainly empirically driven, with less emphasis on a conceptual understanding of prompting. In this paper we discuss how optimal prompting can be understood through a Bayesian view, which also implies some fundamental limitations of prompting that can only be overcome by tuning weights. The paper explains in detail how meta-trained neural networks behave as Bayesian predictors over the pretraining distribution, whose hallmark feature is rapid in-context adaptation. Optimal prompting can be studied formally as conditioning these Bayesian predictors, yielding criteria for target tasks where optimal prompting is and is not possible. We support the theory with educational experiments on LSTMs and Transformers, where we compare different versions of prefix-tuning and different weight-tuning methods. We also confirm that soft prefixes, which are sequences of real-valued vectors outside the token alphabet, can lead to very effective prompts for trained and even untrained networks by manipulating activations in ways that are not achievable by hard tokens. This adds an important mechanistic aspect beyond the conceptual Bayesian theory.
翻译:提示是使预训练模型适应目标任务的主要方式之一。除了手动构建提示外,文献中已提出了多种提示优化方法。方法开发主要受经验驱动,较少强调对提示机制的概念性理解。本文通过贝叶斯视角探讨如何理解最优提示,该视角也揭示了提示的一些根本性局限性——这些局限性只能通过权重调优来克服。本文详细阐述了元训练神经网络如何作为预训练分布上的贝叶斯预测器运行,其标志性特征正是快速的上下文适应能力。最优提示可形式化地研究为对这些贝叶斯预测器进行条件化,从而为目标任务提供判断标准:在哪些情况下最优提示可能实现,哪些情况下无法实现。我们通过LSTM和Transformer的教育性实验为理论提供支持,比较了不同版本的prefix-tuning与不同权重调优方法。实验同时证实,软前缀(即超出词表范围的实值向量序列)可通过操纵激活值的方式——这是硬标记无法实现的——为已训练甚至未训练网络提供非常有效的提示。这为概念性的贝叶斯理论补充了重要的机制性视角。