The recent enthusiasm for open-world vision systems show the high interest of the community to perform perception tasks outside of the closed-vocabulary benchmark setups which have been so popular until now. Being able to discover objects in images/videos without knowing in advance what objects populate the dataset is an exciting prospect. But how to find objects without knowing anything about them? Recent works show that it is possible to perform class-agnostic unsupervised object localization by exploiting self-supervised pre-trained features. We propose here a survey of unsupervised object localization methods that discover objects in images without requiring any manual annotation in the era of self-supervised ViTs. We gather links of discussed methods in the repository https://github.com/valeoai/Awesome-Unsupervised-Object-Localization.
翻译:近年来,开放世界视觉系统引发的研究热潮表明,社区对在封闭词汇基准设置(此前长期占据主导地位)之外执行感知任务具有浓厚兴趣。无需预先知晓数据集中包含何种目标即可在图像/视频中发现目标的能力令人振奋。但如何在完全未知目标信息的情况下完成目标定位?最新研究显示,通过利用自监督预训练特征,可以实现与类别无关的无监督目标定位。本文系统综述了自监督ViT时代下,无需人工标注即可在图像中发现目标的无监督目标定位方法。相关方法链接已整理于 https://github.com/valeoai/Awesome-Unsupervised-Object-Localization 资源库。