Nearest neighbor classifier is arguably the most simple and popular nonparametric classifier available in the literature. However, due to the concentration of pairwise distances and the violation of the neighborhood structure, this classifier often suffers in high-dimension, low-sample size (HDLSS) situations, especially when the scale difference between the competing classes dominates their location difference. Several attempts have been made in the literature to take care of this problem. In this article, we discuss some of these existing methods and propose some new ones. We carry out some theoretical investigations in this regard and analyze several simulated and benchmark datasets to compare the empirical performances of proposed methods with some of the existing ones.
翻译:最近邻分类器无疑是文献中最简单且最流行的非参数分类器。然而,由于成对距离的集中性和邻域结构的破坏,该分类器在高维低样本量情境下往往表现不佳,尤其当竞争类别间的尺度差异主导其位置差异时。文献中已有多种尝试旨在解决此问题。本文讨论了部分现有方法,并提出了一些新方法。我们对此进行了理论探讨,并通过多个模拟数据集和基准数据集的分析,比较了所提方法与部分现有方法的实证性能。