Soft prompt tuning achieves superior performances across a wide range of few-shot tasks. However, the performances of prompt tuning can be highly sensitive to the initialization of the prompts. We also empirically observe that conventional prompt tuning methods cannot encode and learn sufficient task-relevant information from prompt tokens. In this work, we develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters (or encoded representations). This novel view helps us to develop a more efficient, accurate and robust soft prompt tuning method InfoPrompt. With this framework, we develop two novel mutual information based loss functions, to (i) discover proper prompt initialization for the downstream tasks and learn sufficient task-relevant information from prompt tokens and (ii) encourage the output representation from the pretrained language model to be more aware of the task-relevant information captured in the learnt prompt. Extensive experiments validate that InfoPrompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods. Finally, we provide a formal theoretical result for showing to show that gradient descent type algorithm can be used to train our mutual information loss.
翻译:摘要:软提示调优在广泛的小样本任务中展现出卓越性能。然而,提示调优的性能对提示的初始化高度敏感。我们通过实验观察到,传统的提示调优方法无法从提示令牌中充分编码和学习与任务相关的信息。本研究开发了一个信息论框架,将软提示调优形式化为最大化提示与其他模型参数(或编码表示)之间的互信息。这一新颖视角有助于我们开发更高效、准确且鲁棒的软提示调优方法InfoPrompt。基于该框架,我们设计了两种基于互信息的新型损失函数:(i)为下游任务发现合适的提示初始化,并从提示令牌中学习充分的与任务相关的信息;(ii)促进预训练语言模型的输出表示更充分地感知学习到的提示中捕获的任务相关信息。大量实验验证了InfoPrompt能够显著加速提示调优的收敛,并优于传统提示调优方法。最后,我们提供了形式化理论结果,证明梯度下降类算法可用于训练我们的互信息损失函数。