The recent enthusiasm for open-world vision systems show the high interest of the community to perform perception tasks outside of the closed-vocabulary benchmark setups which have been so popular until now. Being able to discover objects in images/videos without knowing in advance what objects populate the dataset is an exciting prospect. But how to find objects without knowing anything about them? Recent works show that it is possible to perform class-agnostic unsupervised object localization by exploiting self-supervised pre-trained features. We propose here a survey of unsupervised object localization methods that discover objects in images without requiring any manual annotation in the era of self-supervised ViTs. We gather links of discussed methods in the repository https://github.com/valeoai/Awesome-Unsupervised-Object-Localization.
翻译:近期对开放世界视觉系统的热情表明,学术界对在封闭词汇基准设置之外执行感知任务抱有浓厚兴趣,而此类设置至今仍广受欢迎。能够在预先不知道数据集中存在何种对象的情况下,发现图像/视频中的对象,是一个令人兴奋的前景。但在对对象一无所知的情况下如何找到它们?最近的研究表明,通过利用自监督预训练特征,可以实现类别无关的无监督目标定位。本文针对自监督视觉Transformer时代下无需任何人工标注即可发现图像中对象的无监督目标定位方法进行综述。我们在代码库 https://github.com/valeoai/Awesome-Unsupervised-Object-Localization 中汇总了所讨论方法的链接。