Classification, a fundamental problem in many fields, faces significant challenges when handling a large number of features, a scenario commonly encountered in modern applications, such as identifying tumor subtypes from genomic data or categorizing customer attitudes based on online reviews. We propose a novel framework that utilizes the ranks of pairwise distances among observations and identifies consistent patterns in moderate- to high- dimensional data, which previous methods have overlooked. The proposed method exhibits superior performance across a variety of scenarios, from high-dimensional data to network data. We further explore a typical setting to investigate key quantities that play essential roles in our framework, which reveal the framework's capabilities in distinguishing differences in the first and/or second moment, as well as distinctions in higher moments.
翻译:分类作为许多领域中的基础问题,在处理大量特征时面临重大挑战,这一场景在现代应用中十分常见,例如从基因组数据中识别肿瘤亚型或基于在线评论对客户态度进行分类。我们提出了一种新颖的框架,该框架利用观测值之间成对距离的排序,并识别中高维数据中先前方法所忽视的一致模式。所提出的方法在从高维数据到网络数据的多种场景中均表现出优越性能。我们进一步探究了一个典型设置,以研究在我们框架中起关键作用的核心量,这些量揭示了该框架在区分一阶和/或二阶矩差异以及更高阶矩差异方面的能力。