Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Their narrow attribute scope and single attribute labeling introduce annotation biases, misleading the learning of attributes and causing inaccurate evaluation. To address these issues, we introduce the Multi-Attribute Composition (MAC) dataset, encompassing 22,838 images and 17,627 compositions with comprehensive and representative attribute annotations. MAC shows complex relationship between attributes and objects, with each attribute type linked to an average of 82.2 object types, and each object type associated with 31.4 attribute types. Based on MAC, we propose multi-attribute compositional zero-shot learning that requires deeper semantic understanding and advanced attribute associations, establishing a more realistic and challenging benchmark for CZSL. We also propose Multi-attribute Visual-Primitive Integrator (MVP-Integrator), a robust baseline for multi-attribute CZSL, which disentangles semantic primitives and performs effective visual-primitive association. Experimental results demonstrate that MVP-Integrator significantly outperforms existing CZSL methods on MAC with improved inference efficiency.
翻译:组合零样本学习(CZSL)旨在从已见组合中学习语义基元(属性与对象),并识别未见过的属性-对象组合。现有CZSL数据集聚焦于单一属性,忽略了对象天然呈现多个相互关联属性的现实。其狭窄的属性范围与单一属性标注引入了标注偏差,误导了属性学习并导致评估失准。为应对这些问题,我们提出了多属性组合(MAC)数据集,涵盖22,838张图像与17,627种组合,并配备全面且具有代表性的属性标注。MAC展现了属性与对象间的复杂关联:每种属性类型平均关联82.2种对象类型,每种对象类型则关联31.4种属性类型。基于MAC,我们提出多属性组合零样本学习任务,该任务要求更深层的语义理解与高级属性关联,为CZSL建立了更贴近现实且更具挑战性的基准。我们还提出了多属性视觉-基元整合器(MVP-Integrator),作为多属性CZSL的强健基线模型,该模型能够解耦语义基元并实现高效的视觉-基元关联。实验结果表明,MVP-Integrator在MAC数据集上显著优于现有CZSL方法,同时提升了推理效率。