Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Their narrow attribute scope and single attribute labeling introduce annotation biases, misleading the learning of attributes and causing inaccurate evaluation. To address these issues, we introduce the Multi-Attribute Composition (MAC) dataset, encompassing 22,838 images and 17,627 compositions with comprehensive and representative attribute annotations. MAC shows complex relationship between attributes and objects, with each attribute type linked to an average of 82.2 object types, and each object type associated with 31.4 attribute types. Based on MAC, we propose multi-attribute compositional zero-shot learning that requires deeper semantic understanding and advanced attribute associations, establishing a more realistic and challenging benchmark for CZSL. We also propose Multi-attribute Visual-Primitive Integrator (MVP-Integrator), a robust baseline for multi-attribute CZSL, which disentangles semantic primitives and performs effective visual-primitive association. Experimental results demonstrate that MVP-Integrator significantly outperforms existing CZSL methods on MAC with improved inference efficiency.
翻译:组合零样本学习旨在从已见组合中学习语义基元(属性与对象),并识别未见过的属性-对象组合。现有的组合零样本学习数据集主要关注单一属性,忽略了对象天然具有多个相互关联属性的事实。其狭窄的属性范围与单一属性标注引入了标注偏差,误导了属性的学习并导致评估不准确。为解决这些问题,我们提出了多属性组合数据集,该数据集包含22,838张图像与17,627种组合,并提供了全面且具有代表性的属性标注。MAC展现了属性与对象之间复杂的关系:每种属性类型平均与82.2种对象类型相关联,每种对象类型则与31.4种属性类型相关联。基于MAC,我们提出了多属性组合零样本学习任务,该任务要求更深层的语义理解与更高级的属性关联,从而为组合零样本学习建立了一个更贴近现实且更具挑战性的基准。我们还提出了多属性视觉-基元集成器,这是一个针对多属性组合零样本学习的鲁棒基线模型,它能够解耦语义基元并实现有效的视觉-基元关联。实验结果表明,在MAC数据集上,多属性视觉-基元集成器显著优于现有的组合零样本学习方法,并提升了推理效率。