Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Real-world objects often possess multiple interrelated attributes, and current datasets' narrow attribute scope and single attribute labeling introduce annotation biases, undermining model performance and evaluation. To address these limitations, we introduce the Multi-Attribute Composition (MAC) dataset, encompassing 18,217 images and 11,067 compositions with comprehensive, representative, and diverse attribute annotations. MAC includes an average of 30.2 attributes per object and 65.4 objects per attribute, facilitating better multi-attribute composition predictions. Our dataset supports deeper semantic understanding and higher-order attribute associations, providing a more realistic and challenging benchmark for the CZSL task. We also develop solutions for multi-attribute compositional learning and propose the MM-encoder to disentangling the attributes and objects.
翻译:组合零样本学习(CZSL)旨在从已见组合中学习语义基元(属性与对象),并识别未见属性-对象组合。现有CZSL数据集聚焦于单一属性,忽略了对象天然呈现多个相互关联属性的现实。真实世界中的对象常具备多个相互关联的属性,而当前数据集有限的属性范围与单一属性标注会引入标注偏差,从而损害模型性能与评估有效性。为突破这些局限,我们提出了多属性组合(MAC)数据集,该数据集包含18,217张图像与11,067种组合,并提供了全面、具代表性且多样化的属性标注。MAC中每个对象平均标注30.2个属性,每个属性平均关联65.4个对象,有助于实现更优的多属性组合预测。本数据集支持更深层的语义理解与高阶属性关联,为CZSL任务提供了更贴近现实且更具挑战性的基准。我们还开发了多属性组合学习的解决方案,并提出MM-编码器以实现属性与对象的解耦。