A common way to evaluate the reliability of dimensionality reduction (DR) embeddings is to quantify how well labeled classes form compact, mutually separated clusters in the embeddings. This approach is based on the assumption that the classes stay as clear clusters in the original high-dimensional space. However, in reality, this assumption can be violated; a single class can be fragmented into multiple separated clusters, and multiple classes can be merged into a single cluster. We thus cannot always assure the credibility of the evaluation using class labels. In this paper, we introduce two novel quality measures -- Label-Trustworthiness and Label-Continuity (Label-T&C) -- advancing the process of DR evaluation based on class labels. Instead of assuming that classes are well-clustered in the original space, Label-T&C work by (1) estimating the extent to which classes form clusters in the original and embedded spaces and (2) evaluating the difference between the two. A quantitative evaluation showed that Label-T&C outperform widely used DR evaluation measures (e.g., Trustworthiness and Continuity, Kullback-Leibler divergence) in terms of the accuracy in assessing how well DR embeddings preserve the cluster structure, and are also scalable. Moreover, we present case studies demonstrating that Label-T&C can be successfully used for revealing the intrinsic characteristics of DR techniques and their hyperparameters.
翻译:常见的降维嵌入可靠性评估方法是通过量化标记类别在嵌入空间中是否形成紧凑且相互分离的簇。该方法基于一个假设:原始高维空间中各标记类别能够形成清晰的簇结构。然而在实际场景中,该假设可能被违背——单个类别可能分裂为多个分离的簇,而多个类别也可能合并为单一簇。因此,我们无法始终确保使用类别标签进行评估的可信度。本文提出两种新型质量度量——标签可信度与标签连续性——推进基于类别标签的降维评估方法。与假设原始空间中类别具有良好簇结构的传统思路不同,该度量通过以下两步实现:(1) 估计原始空间与嵌入空间中各类别形成簇结构的程度;(2) 评估两者差异。定量评估表明,在衡量降维嵌入保留簇结构的准确性与可扩展性方面,该度量优于广泛使用的降维评价指标(如可信度与连续性、Kullback-Leibler散度)。此外,案例研究显示该度量可成功用于揭示降维技术及其超参数的内在特性。