Self-supervised learning (SSL) can be used to solve complex visual tasks without human labels. Self-supervised representations encode useful semantic information about images, and as a result, they have already been used for tasks such as unsupervised semantic segmentation. In this paper, we investigate self-supervised representations for instance segmentation without any manual annotations. We find that the features of different SSL methods vary in their level of instance-awareness. In particular, DINO features, which are known to be excellent semantic descriptors, lack behind MAE features in their sensitivity for separating instances.
翻译:自监督学习(SSL)可在无需人工标注的情况下解决复杂视觉任务。自监督表征编码了图像中有用的语义信息,因此已被用于无监督语义分割等任务。本文研究了无需任何人工标注的自监督表征在实例分割中的应用。我们发现不同SSL方法的特征在实例感知能力上存在差异。具体而言,作为已知优秀语义描述子的DINO特征,其在区分实例的敏感度方面落后于MAE特征。