This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM), i.e., improving the performance on unseen classes while maintaining the performance on seen classes. Comparing with existing generalizable methods that neglect the seen classes degradation, the setting of this problem is more strict and fits more closely with practical applications. To solve this problem, we start from the optimization perspective, and leverage the relationship between loss landscape geometry and model generalization ability. By analyzing the loss landscape of the state-of-the-art method and the widely-used Sharpness-aware Minimization (SAM), we conclude that the trade-off performance correlates to both loss value and loss sharpness, while each of them are indispensable. However, we find the optimizing gradient of existing methods cannot always maintain high consistency with both loss value and loss sharpness during the whole optimization procedure. To this end, we propose an novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp), to dynamically constrains the optimizing gradient, thus achieving above two-fold optimization objective simultaneously. Extensive experiments verify the effectiveness of GCSCoOp in the trade-off problem.
翻译:本文针对视觉语言模型(VLM)中泛化提示学习面临的新型权衡问题,即在保持已知类别性能的同时提升未知类别表现。与忽视已知类别性能下降的现有泛化方法相比,该问题的设定更为严苛且更贴合实际应用。为解决此问题,我们从优化视角出发,利用损失景观几何特性与模型泛化能力之间的关联。通过分析当前最先进方法的损失景观及广泛使用的锐度感知最小化(SAM)技术,我们得出结论:权衡性能与损失值和损失锐度均相关,且二者缺一不可。然而研究发现,现有方法在整个优化过程中的梯度方向无法始终与损失值与损失锐度保持高度一致性。为此,我们提出一种基于SAM的新型提示学习方法——梯度约束锐度感知上下文优化(GCSCoOp),该方法通过动态约束优化梯度,同时实现上述双重优化目标。大量实验验证了GCSCoOp在权衡问题上的有效性。