Open-World Compositional Zero-shot Learning (OW-CZSL) aims to recognize novel compositions of state and object primitives in images with no priors on the compositional space, which induces a tremendously large output space containing all possible state-object compositions. Existing works either learn the joint compositional state-object embedding or predict simple primitives with separate classifiers. However, the former heavily relies on external word embedding methods, and the latter ignores the interactions of interdependent primitives, respectively. In this paper, we revisit the primitive prediction approach and propose a novel method, termed Progressive Cross-primitive Compatibility (ProCC), to mimic the human learning process for OW-CZSL tasks. Specifically, the cross-primitive compatibility module explicitly learns to model the interactions of state and object features with the trainable memory units, which efficiently acquires cross-primitive visual attention to reason high-feasibility compositions, without the aid of external knowledge. Moreover, considering the partial-supervision setting (pCZSL) as well as the imbalance issue of multiple task prediction, we design a progressive training paradigm to enable the primitive classifiers to interact to obtain discriminative information in an easy-to-hard manner. Extensive experiments on three widely used benchmark datasets demonstrate that our method outperforms other representative methods on both OW-CZSL and pCZSL settings by large margins.
翻译:摘要:开放世界组合零样本学习(OW-CZSL)旨在识别图像中状态与对象基元的新型组合,且无需对组合空间施加先验约束,这导致输出空间包含所有可能的状态-对象组合,规模极其庞大。现有工作要么学习联合的组合状态-对象嵌入,要么使用独立分类器预测简单基元。然而,前者严重依赖外部词嵌入方法,后者则忽略了相互依赖基元之间的交互。本文重新审视基元预测方法,提出一种名为渐进式跨基元兼容性(ProCC)的新方法,用以模拟人类学习过程以解决OW-CZSL任务。具体而言,跨基元兼容性模块通过可训练记忆单元显式学习状态与对象特征的交互,高效获取跨基元视觉注意力以推理高可行性组合,无需借助外部知识。此外,针对部分监督设置(pCZSL)及多任务预测中的不平衡问题,我们设计了一种渐进式训练范式,使基元分类器能够以由易到难的方式交互获取判别信息。在三个广泛使用的基准数据集上的大量实验表明,我们的方法在OW-CZSL和pCZSL两种设置下均以显著优势优于其他代表性方法。