While remarkable success has been achieved in weakly-supervised object localization (WSOL), current frameworks are not capable of locating objects of novel categories in open-world settings. To address this issue, we are the first to introduce a new weakly-supervised object localization task called OWSOL (Open-World Weakly-Supervised Object Localization). During training, all labeled data comes from known categories and, both known and novel categories exist in the unlabeled data. To handle such data, we propose a novel paradigm of contrastive representation co-learning using both labeled and unlabeled data to generate a complete G-CAM (Generalized Class Activation Map) for object localization, without the requirement of bounding box annotation. As no class label is available for the unlabelled data, we conduct clustering over the full training set and design a novel multiple semantic centroids-driven contrastive loss for representation learning. We re-organize two widely used datasets, i.e., ImageNet-1K and iNatLoc500, and propose OpenImages150 to serve as evaluation benchmarks for OWSOL. Extensive experiments demonstrate that the proposed method can surpass all baselines by a large margin. We believe that this work can shift the close-set localization towards the open-world setting and serve as a foundation for subsequent works. Code will be released at https://github.com/ryylcc/OWSOL.
翻译:尽管弱监督目标定位(WSOL)已取得显著成功,但现有框架无法在开放世界环境下定位新类别的目标。为解决这一问题,我们首次引入一项名为OWSOL(开放世界弱监督目标定位)的新任务。在训练过程中,所有带标签数据均来自已知类别,而无标签数据中同时存在已知和未知类别。为处理此类数据,我们提出一种基于带标签与无标签数据协同学习的对比表示学习新范式,无需边界框标注即可生成用于目标定位的完整G-CAM(广义类激活图)。由于无标签数据缺乏类别标签,我们对完整训练集进行聚类,并设计一种新型多语义质心驱动的对比损失函数用于表示学习。我们重新组织了两个广泛使用的数据集(即ImageNet-1K和iNatLoc500),并构建OpenImages150作为OWSOL的评估基准。大量实验表明,所提方法大幅超越了所有基线模型。我们相信,本工作可将封闭集定位转向开放世界环境,并为后续研究奠定基础。代码将发布于https://github.com/ryylcc/OWSOL。