基于提示的持续组合零样本学习 (Prompt-Based Continual Compositional Zero-Shot Learning)

We tackle continual adaptation of vision-language models to new attributes, objects, and their compositions in Compositional Zero-Shot Learning (CZSL), while preventing forgetting of prior knowledge. Unlike classical continual learning where classes are disjoint, CCZSL is more complex as attributes and objects may reoccur across sessions while compositions remain unique. Built on a frozen VLM backbone, we propose the first Prompt-based Continual Compositional Zero-Shot Learning (PromptCCZSL) framework that retains prior knowledge through recency-weighted multi-teacher distillation. It employs session-aware compositional prompts to fuse multimodal features for new compositions, while attribute and object prompts are learned through session-agnostic fusion to maintain global semantic consistency, which is further stabilized by a Cosine Anchor Loss (CAL) to preserve prior knowledge. To enhance adaptation in the current session, an Orthogonal Projection Loss (OPL) ensures that new attribute and object embeddings remain distinct from previous ones, preventing overlap, while an Intra-Session Diversity Loss (IDL) promotes variation among current-session embeddings for richer, more discriminative representations. We also introduce a comprehensive protocol that jointly measures catastrophic forgetting and compositional generalization. Extensive experiments on UT-Zappos and C-GQA benchmarks demonstrate that PromptCCZSL achieves substantial improvements over prior VLM-based and non-VLM baselines, setting a new benchmark for CCZSL in closed-world settings.

翻译：本文研究视觉语言模型在组合零样本学习（CZSL）中针对新属性、新对象及其组合的持续适应问题，同时防止对已有知识的遗忘。与经典持续学习中类别互斥不同，持续组合零样本学习（CCZSL）更为复杂，因为属性和对象可能在多个会话中重复出现，而组合却保持唯一性。基于冻结的视觉语言模型骨干网络，我们提出了首个基于提示的持续组合零样本学习（PromptCCZSL）框架，该框架通过基于时效加权的多教师蒸馏来保留先验知识。它采用会话感知的组合提示来融合多模态特征以处理新组合，同时通过会话无关的融合学习属性和对象提示，以保持全局语义一致性，并通过余弦锚定损失（CAL）进一步稳定以保护先验知识。为增强当前会话的适应能力，正交投影损失（OPL）确保新的属性和对象嵌入与先前嵌入保持区分，防止重叠；而会话内多样性损失（IDL）则促进当前会话嵌入之间的差异性，以获得更丰富、更具判别性的表示。我们还引入了一个综合评估协议，联合度量灾难性遗忘和组合泛化能力。在UT-Zappos和C-GQA基准上的大量实验表明，PromptCCZSL相较于先前的基于视觉语言模型及非视觉语言模型基线取得了显著提升，为封闭世界设定下的CCZSL设立了新的基准。

相关内容