Foundational image-language models have generated considerable interest due to their efficient adaptation to downstream tasks by prompt learning. Prompt learning treats part of the language model input as trainable while freezing the rest, and optimizes an Empirical Risk Minimization objective. However, Empirical Risk Minimization is known to suffer from distributional shifts which hurt generalizability to prompts unseen during training. By leveraging the regularization ability of Bayesian methods, we frame prompt learning from the Bayesian perspective and formulate it as a variational inference problem. Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts. Our framework is implemented by modeling the input prompt space in a probabilistic manner, as an a priori distribution which makes our proposal compatible with prompt learning approaches that are unconditional or conditional on the image. We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space, prevents learning spurious features, and exploits transferable invariant features. This results in better generalization of unseen prompts, even across different datasets and domains.
翻译:基础图像-语言模型因其通过提示学习高效适配下游任务而备受关注。提示学习将语言模型部分输入设为可训练,其余部分冻结,并优化经验风险最小化目标。然而,经验风险最小化易受分布偏移影响,导致对训练中未见提示的泛化能力下降。通过利用贝叶斯方法的正则化能力,我们从贝叶斯视角重新审视提示学习,并将其构建为变分推论问题。该方法正则化提示空间,减少对已见提示的过拟合,并提升对未见提示的泛化性能。我们通过概率化建模输入提示空间(即先验分布)实现该框架,使其兼容无条件或基于图像条件的提示学习方法。在15个基准上的实验表明,贝叶斯提示学习能够恰当覆盖提示空间、防止学习虚假特征,并利用可迁移的不变特征,从而实现对未见提示(甚至跨数据集和领域)的更好泛化。