Visualization methods based on the nearest neighbor graph, such as t-SNE or UMAP, are widely used for visualizing high-dimensional data. Yet, these approaches only produce meaningful results if the nearest neighbors themselves are meaningful. For images represented in pixel space this is not the case, as distances in pixel space are often not capturing our sense of similarity and therefore neighbors are not semantically close. This problem can be circumvented by self-supervised approaches based on contrastive learning, such as SimCLR, relying on data augmentation to generate implicit neighbors, but these methods do not produce two-dimensional embeddings suitable for visualization. Here, we present a new method, called t-SimCNE, for unsupervised visualization of image data. T-SimCNE combines ideas from contrastive learning and neighbor embeddings, and trains a parametric mapping from the high-dimensional pixel space into two dimensions. We show that the resulting 2D embeddings achieve classification accuracy comparable to the state-of-the-art high-dimensional SimCLR representations, thus faithfully capturing semantic relationships. Using t-SimCNE, we obtain informative visualizations of the CIFAR-10 and CIFAR-100 datasets, showing rich cluster structure and highlighting artifacts and outliers.
翻译:基于最近邻图的可视化方法(如t-SNE或UMAP)被广泛用于高维数据可视化。然而,这些方法仅当最近邻本身具有意义时才能产生有效结果。对于像素空间中表示的图像而言,情况并非如此,因为像素空间中的距离通常无法捕捉我们的相似性感知,因此近邻在语义上并不相近。这一问题可通过基于对比学习的自监督方法(如SimCLR)得到规避,此类方法依赖数据增强生成隐式近邻,但无法产生适用于可视化的二维嵌入。本文提出一种名为t-SimCNE的新方法,用于图像数据的无监督可视化。T-SimCNE融合了对比学习与近邻嵌入的思想,并训练从高维像素空间到二维空间的参数化映射。实验表明,所生成的二维嵌入在分类准确率上与当前最先进的高维SimCLR表示相当,从而可靠地捕捉了语义关系。利用t-SimCNE,我们获得了CIFAR-10和CIFAR-100数据集的丰富可视化结果,展现了清晰的聚类结构,并突显了异常样本与离群点。