This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM), i.e., improving the performance on unseen classes while maintaining the performance on seen classes. Comparing with existing generalizable methods that neglect the seen classes degradation, the setting of this problem is more strict and fits more closely with practical applications. To solve this problem, we start from the optimization perspective, and leverage the relationship between loss landscape geometry and model generalization ability. By analyzing the loss landscapes of the state-of-the-art method and vanilla Sharpness-aware Minimization (SAM) based method, we conclude that the trade-off performance correlates to both loss value and loss sharpness, while each of them is indispensable. However, we find the optimizing gradient of existing methods cannot maintain high relevance to both loss value and loss sharpness during optimization, which severely affects their trade-off performance. To this end, we propose a novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp), to dynamically constrain the optimizing gradient, thus achieving above two-fold optimization objective simultaneously. Extensive experiments verify the effectiveness of GCSCoOp in the trade-off problem.
翻译:本文针对视觉-语言模型(VLM)中泛化提示学习的一个新权衡问题,即在提升未见类别性能的同时保持已见类别性能。与现有忽略已见类别性能下降的泛化方法相比,该问题的设定更为严格,且更贴合实际应用。为解决此问题,我们从优化视角出发,利用损失景观几何与模型泛化能力之间的关系。通过分析最先进方法及基于经典锐度感知最小化(SAM)方法的损失景观,我们得出结论:权衡性能与损失值和损失锐度均相关,且两者缺一不可。然而,我们发现现有方法的优化梯度在优化过程中无法保持与损失值和损失锐度的高度相关性,这严重影响了其权衡性能。为此,我们提出一种基于SAM的新型提示学习方法,称为梯度约束的锐度感知上下文优化(GCSCoOp),通过动态约束优化梯度,同时实现上述双重优化目标。大量实验验证了GCSCoOp在权衡问题中的有效性。