In this paper we tackle the problem of Generalized Category Discovery (GCD). Specifically, given a dataset with labelled and unlabelled images, the task is to cluster all images in the unlabelled subset, whether or not they belong to the labelled categories. Our first contribution is to recognize that most existing GCD benchmarks only contain labels for a single clustering of the data, making it difficult to ascertain whether models are using the available labels to solve the GCD task, or simply solving an unsupervised clustering problem. As such, we present a synthetic dataset, named 'Clevr-4', for category discovery. Clevr-4 contains four equally valid partitions of the data, i.e based on object shape, texture, color or count. To solve the task, models are required to extrapolate the taxonomy specified by the labelled set, rather than simply latching onto a single natural grouping of the data. We use this dataset to demonstrate the limitations of unsupervised clustering in the GCD setting, showing that even very strong unsupervised models fail on Clevr-4. We further use Clevr-4 to examine the weaknesses of existing GCD algorithms, and propose a new method which addresses these shortcomings, leveraging consistent findings from the representation learning literature to do so. Our simple solution, which is based on 'mean teachers' and termed $\mu$GCD, substantially outperforms implemented baselines on Clevr-4. Finally, when we transfer these findings to real data on the challenging Semantic Shift Benchmark (SSB), we find that $\mu$GCD outperforms all prior work, setting a new state-of-the-art. For the project webpage, see https://www.robots.ox.ac.uk/~vgg/data/clevr4/
翻译:本文研究广义类别发现(GCD)问题。具体而言,给定一个包含标注和未标注图像的数据集,任务是对未标注子集中的所有图像进行聚类,无论它们是否属于已标注类别。我们的第一个贡献是认识到:现有大多数GCD基准仅包含数据单一聚类的标签,这导致难以判断模型是利用已有标签解决GCD任务,还是仅仅求解无监督聚类问题。为此,我们提出了一个名为"Clevr-4"的合成数据集用于类别发现。Clevr-4包含四种等价的数划分方式(基于物体形状、纹理、颜色或数量)。为完成该任务,模型需要外推标注集指定的分类体系,而非仅抓住数据的一个自然分组。我们利用该数据集展示了无监督聚类在GCD场景中的局限性,表明即使非常强的无监督模型也在Clevr-4上失败。我们进一步利用Clevr-4检验现有GCD算法的弱点,并提出一种新方法来解决这些不足——该方法借鉴了表征学习文献中的一致发现。我们的简单解决方案基于"均值教师"结构,命名为μGCD,在Clevr-4上显著优于实现的基线方法。最后,当我们将这些发现迁移到具有挑战性的语义偏移基准(SSB)的真实数据上时,μGCD超越了所有先前工作,创下了新纪录。项目网页见:https://www.robots.ox.ac.uk/~vgg/data/clevr4/