Costly, noisy, and over-specialized, labels are to be set aside in favor of unsupervised learning if we hope to learn cheap, reliable, and transferable models. To that end, spectral embedding, self-supervised learning, or generative modeling have offered competitive solutions. Those methods however come with numerous challenges \textit{e.g.} estimating geodesic distances, specifying projector architectures and anti-collapse losses, or specifying decoder architectures and reconstruction losses. In contrast, we introduce a simple explainable alternative -- coined \textbf{DIET} -- to learn representations from unlabeled data, free of those challenges. \textbf{DIET} is blatantly simple: take one's favorite classification setup and use the \textbf{D}atum \textbf{I}nd\textbf{E}x as its \textbf{T}arget class, \textit{i.e. each sample is its own class}, no further changes needed. \textbf{DIET} works without a decoder/projector network, is not based on positive pairs nor reconstruction, introduces no hyper-parameters, and works out-of-the-box across datasets and architectures. Despite \textbf{DIET}'s simplicity, the learned representations are of high-quality and often on-par with the state-of-the-art \textit{e.g.} using a linear classifier on top of DIET's learned representation reaches $71.4\%$ on CIFAR100 with a Resnet101, $52.5\%$ on TinyImagenet with a Resnext50.
翻译:标签成本高昂、噪声大且过度专用化,若期望学习廉价、可靠且可迁移的模型,应弃用标签而转向无监督学习。为此,谱嵌入、自监督学习或生成式建模已提供了有竞争力的解决方案。然而,这些方法面临诸多挑战,例如估计测地距离、设计投影器架构与抗坍缩损失,或指定解码器架构与重构损失。相比之下,我们提出一种简单且可解释的替代方案——称为**DIET**——用于从无标签数据中学习表征,且无需应对上述挑战。**DIET**极其简明:采用任意常规分类设置,并以**数据索引**作为其**目标类**,即每个样本自身即为一类,无需额外修改。**DIET**无需解码器/投影器网络,不依赖于正样本对或重构损失,不引入超参数,且可在不同数据集与架构上即插即用。尽管**DIET**简单,其学得的表征质量高,通常与当前最优方法相媲美——例如,在DIET学得的表征之上使用线性分类器,在CIFAR100数据集上使用ResNet101达到71.4%的准确率,在TinyImageNet数据集上使用ResNeXt50达到52.5%的准确率。