We propose a novel semi-parametric classifier based on Mahalanobis distances of an observation from the competing classes. Our tool is a generalized additive model with the logistic link function that uses these distances as features to estimate the posterior probabilities of the different classes. While popular parametric classifiers like linear and quadratic discriminant analyses are mainly motivated by the normality of the underlying distributions, the proposed classifier is more flexible and free from such parametric assumptions. Since the densities of elliptic distributions are functions of Mahalanobis distances, this classifier works well when the competing classes are (nearly) elliptic. In such cases, it often outperforms popular nonparametric classifiers, especially when the sample size is small compared to the dimension of the data. To cope with non-elliptic and possibly multimodal distributions, we propose a local version of the Mahalanobis distance. Subsequently, we propose another classifier based on a generalized additive model that uses the local Mahalanobis distances as features. This nonparametric classifier usually performs like the Mahalanobis distance based semiparametric classifier when the underlying distributions are elliptic, but outperforms it for several non-elliptic and multimodal distributions. We also investigate the behaviour of these two classifiers in high dimension, low sample size situations. A thorough numerical study involving several simulated and real datasets demonstrate the usefulness of the proposed classifiers in comparison to many state-of-the-art methods.
翻译:本文提出了一种新型半参数分类器,该分类器基于观测值与竞争类别间的马氏距离。我们采用带有逻辑链接函数的广义加性模型,以这些距离作为特征来估计各类别的后验概率。虽然线性判别分析和二次判别分析等主流参数分类器主要受限于潜在分布的正态性假设,但本文提出的分类器更具灵活性,且不受此类参数假设约束。由于椭圆分布的概率密度函数是马氏距离的函数,该分类器在竞争类别呈(近似)椭圆分布时表现优异。在此类情形下,它通常优于流行的非参数分类器,尤其当样本量相对于数据维度较小时。为处理非椭圆甚至多模态分布问题,我们提出了局部马氏距离版本。基于此,我们构建了另一个采用局部马氏距离作为特征的广义加性模型分类器。当潜在分布呈椭圆时,该非参数分类器性能通常与基于马氏距离的半参数分类器相当,但对于多种非椭圆和多模态分布,其性能更优。我们还研究了这两种分类器在高维低样本量场景下的表现。通过包含多个模拟数据集和真实数据集的全面数值实验,与多种前沿方法相比,验证了所提分类器的有效性。