While remarkable success has been achieved in weakly-supervised object localization (WSOL), current frameworks are not capable of locating objects of novel categories in open-world settings. To address this issue, we are the first to introduce a new weakly-supervised object localization task called OWSOL (Open-World Weakly-Supervised Object Localization). During training, all labeled data comes from known categories and, both known and novel categories exist in the unlabeled data. To handle such data, we propose a novel paradigm of contrastive representation co-learning using both labeled and unlabeled data to generate a complete G-CAM (Generalized Class Activation Map) for object localization, without the requirement of bounding box annotation. As no class label is available for the unlabelled data, we conduct clustering over the full training set and design a novel multiple semantic centroids-driven contrastive loss for representation learning. We re-organize two widely used datasets, i.e., ImageNet-1K and iNatLoc500, and propose OpenImages150 to serve as evaluation benchmarks for OWSOL. Extensive experiments demonstrate that the proposed method can surpass all baselines by a large margin. We believe that this work can shift the close-set localization towards the open-world setting and serve as a foundation for subsequent works. Code will be released at https://github.com/ryylcc/OWSOL.
翻译:尽管弱监督目标定位(WSOL)已取得显著成功,但现有框架无法在开放世界场景中定位新类别的目标。为解决此问题,我们首次提出一项新型弱监督目标定位任务,即OWSOL(开放世界弱监督目标定位)。在训练过程中,所有标注数据均来自已知类别,而未标注数据中同时包含已知与新颖类别。为处理此类数据,我们提出了一种对比表征协同学习的新范式,利用标注与未标注数据生成用于目标定位的完整G-CAM(广义类激活图),且无需边界框标注。由于未标注数据缺乏类别标签,我们对整个训练集进行聚类,并设计了一种以多语义质心驱动的对比损失函数用于表征学习。我们重新整理了ImageNet-1K和iNatLoc500两个广泛使用的数据集,并提出OpenImages150作为OWSOL的评估基准。大量实验表明,所提方法能以显著优势超越所有基线模型。我们相信,本工作可将封闭集定位任务推向开放世界场景,并为后续研究奠定基础。代码将发布于https://github.com/ryylcc/OWSOL。