Recent advances in prompt optimization have notably enhanced the performance of pre-trained language models (PLMs) on downstream tasks. However, the potential of optimized prompts on domain generalization has been under-explored. To explore the nature of prompt generalization on unknown domains, we conduct pilot experiments and find that (i) Prompts gaining more attention weight from PLMs' deep layers are more generalizable and (ii) Prompts with more stable attention distributions in PLMs' deep layers are more generalizable. Thus, we offer a fresh objective towards domain-generalizable prompts optimization named "Concentration", which represents the "lookback" attention from the current decoding token to the prompt tokens, to increase the attention strength on prompts and reduce the fluctuation of attention distribution. We adapt this new objective to popular soft prompt and hard prompt optimization methods, respectively. Extensive experiments demonstrate that our idea improves comparison prompt optimization methods by 1.42% for soft prompt generalization and 2.16% for hard prompt generalization in accuracy on the multi-source domain generalization setting, while maintaining satisfying in-domain performance. The promising results validate the effectiveness of our proposed prompt optimization objective and provide key insights into domain-generalizable prompts.
翻译:近期提示优化技术的进展显著提升了预训练语言模型在下游任务上的性能。然而,优化提示在领域泛化方面的潜力尚未得到充分探索。为探究提示在未知领域上的泛化本质,我们进行了初步实验并发现:(i) 在预训练语言模型深层获得更多注意力权重的提示具有更好的泛化性;(ii) 在预训练语言模型深层注意力分布更稳定的提示具有更好的泛化性。基于此,我们提出了一种面向领域泛化提示优化的新目标——“集中度”,该指标表征当前解码词元对提示词元的“回看”注意力,旨在增强对提示的关注强度并降低注意力分布的波动。我们将这一新目标分别适配到主流的软提示与硬提示优化方法中。大量实验表明,在多源领域泛化设定下,我们的方法将对比提示优化方法的准确率提升了1.42%(软提示泛化)和2.16%(硬提示泛化),同时保持了令人满意的域内性能。这些积极结果验证了所提提示优化目标的有效性,并为领域泛化提示提供了关键见解。