Test-time prompt tuning (TPT) has emerged as a promising technique for adapting large vision-language models (VLMs) to unseen tasks without relying on labeled data. However, the lack of dispersion between textual features can hurt calibration performance, which raises concerns about VLMs' reliability, trustworthiness, and safety. Current TPT approaches primarily focus on improving prompt calibration by either maximizing average textual feature dispersion or enforcing orthogonality constraints to encourage angular separation. However, these methods may not always have optimal angular separation between class-wise textual features, which implies overlooking the critical role of angular diversity. To address this, we propose A-TPT, a novel TPT framework that introduces angular diversity to encourage uniformity in the distribution of normalized textual features induced by corresponding learnable prompts. This uniformity is achieved by maximizing the minimum pairwise angular distance between features on the unit hypersphere. We show that our approach consistently surpasses state-of-the-art TPT methods in reducing the aggregate average calibration error while maintaining comparable accuracy through extensive experiments with various backbones on different datasets. Notably, our approach exhibits superior zero-shot calibration performance on natural distribution shifts and generalizes well to medical datasets. We provide extensive analyses, including theoretical aspects, to establish the grounding of A-TPT. These results highlight the potency of promoting angular diversity to achieve well-dispersed textual features, significantly improving VLM calibration during test-time adaptation. Our code will be made publicly available.
翻译:测试时提示调优(TPT)已成为一种无需依赖标注数据即可将大型视觉语言模型(VLM)适配到未见任务的前沿技术。然而,文本特征间缺乏离散性会损害校准性能,这引发了关于VLM可靠性、可信度与安全性的担忧。现有TPT方法主要通过最大化平均文本特征离散度或施加正交约束以促进角向分离来改进提示校准。但这些方法可能无法始终实现类间文本特征的最优角向分离,这意味着忽视了角向多样性的关键作用。为此,我们提出A-TPT——一种新颖的TPT框架,通过引入角向多样性来促进由可学习提示生成的归一化文本特征在单位超球面上分布的均匀性。该均匀性通过最大化特征间最小成对角向距离实现。我们通过在不同数据集上使用多种骨干网络进行大量实验证明,本方法在保持相当准确度的同时,能持续超越最先进的TPT方法以降低聚合平均校准误差。值得注意的是,我们的方法在自然分布偏移上展现出卓越的零样本校准性能,并能良好泛化至医学数据集。我们提供了包含理论分析在内的全面论证以确立A-TPT的基础。这些结果凸显了通过提升角向多样性实现文本特征充分离散的有效性,从而显著改进测试时适配过程中的VLM校准性能。相关代码将公开发布。