A common way to evaluate the reliability of dimensionality reduction (DR) embeddings is to quantify how well labeled classes form compact, mutually separated clusters in the embeddings. This approach is based on the assumption that the classes stay as clear clusters in the original high-dimensional space. However, in reality, this assumption can be violated; a single class can be fragmented into multiple separated clusters, and multiple classes can be merged into a single cluster. We thus cannot always assure the credibility of the evaluation using class labels. In this paper, we introduce two novel quality measures -- Label-Trustworthiness and Label-Continuity (Label-T&C) -- advancing the process of DR evaluation based on class labels. Instead of assuming that classes are well-clustered in the original space, Label-T&C work by (1) estimating the extent to which classes form clusters in the original and embedded spaces and (2) evaluating the difference between the two. A quantitative evaluation showed that Label-T&C outperform widely used DR evaluation measures (e.g., Trustworthiness and Continuity, Kullback-Leibler divergence) in terms of the accuracy in assessing how well DR embeddings preserve the cluster structure, and are also scalable. Moreover, we present case studies demonstrating that Label-T&C can be successfully used for revealing the intrinsic characteristics of DR techniques and their hyperparameters.
翻译:评估降维嵌入可靠性的常用方法,是量化带标签的类别在嵌入空间中形成紧凑且相互分离的聚类程度。该方法基于一个假设:原始高维空间中这些类别会形成清晰的聚类。然而,现实中这一假设可能不成立;单个类别可能分裂为多个分离的聚类,而多个类别可能合并成一个聚类。因此,我们无法始终保证基于类别标签的评估可信度。本文提出两种新型质量度量——标签可信度与标签连续性——推动基于类别标签的降维评估流程的进步。与假设原始空间中类别形成良好聚类的传统方法不同,标签可信度与标签连续性通过以下步骤运作:(1)估计类别在原始空间与嵌入空间中形成聚类的程度;(2)评估两者之间的差异。定量评估表明,在评估降维嵌入保留聚类结构的准确性方面,标签可信度与标签连续性优于广泛使用的降维评估度量(如可信度与连续性、库尔贝克-莱布勒散度),且具备可扩展性。此外,我们通过案例研究证明,标签可信度与标签连续性可成功用于揭示降维技术及其超参数的内在特征。