Linear discriminant analysis (LDA) is a widely used technique for data classification. The method offers adequate performance in many classification problems, but it becomes inefficient when the data covariance matrix is ill-conditioned. This often occurs when the feature space's dimensionality is higher than or comparable to the training data size. Regularized LDA (RLDA) methods based on regularized linear estimators of the data covariance matrix have been proposed to cope with such a situation. The performance of RLDA methods is well studied, with optimal regularization schemes already proposed. In this paper, we investigate the capability of a positive semidefinite ridge-type estimator of the inverse covariance matrix that coincides with a nonlinear (NL) covariance matrix estimator. The estimator is derived by reformulating the score function of the optimal classifier utilizing linear estimation methods, which eventually results in the proposed NL-RLDA classifier. We derive asymptotic and consistent estimators of the proposed technique's misclassification rate under the assumptions of a double-asymptotic regime and multivariate Gaussian model for the classes. The consistent estimator, coupled with a one-dimensional grid search, is used to set the value of the regularization parameter required for the proposed NL-RLDA classifier. Performance evaluations based on both synthetic and real data demonstrate the effectiveness of the proposed classifier. The proposed technique outperforms state-of-art methods over multiple datasets. When compared to state-of-the-art methods across various datasets, the proposed technique exhibits superior performance.
翻译:线性判别分析(LDA)是一种广泛使用的数据分类技术。该方法在许多分类问题中表现良好,但当数据协方差矩阵病态时,其性能会显著下降。这种情况通常出现在特征空间维度高于或接近训练数据规模时。为应对此类问题,研究者提出了基于数据协方差矩阵正则化线性估计器的正则化LDA(RLDA)方法。RLDA方法的性能已被充分研究,其中已提出最优正则化方案。本文研究了一种与非线性(NL)协方差矩阵估计器等价的正半定岭型逆协方差矩阵估计器的能力。该估计器通过利用线性估计方法重新表述最优分类器的得分函数推导得出,最终形成了所提出的NL-RLDA分类器。我们在双渐近假设和类服从多元高斯模型的条件下,推导了该技术误分类率的渐近一致估计量。该一致估计量结合一维网格搜索,用于设置所提出的NL-RLDA分类器所需的正则化参数值。基于合成数据和真实数据的性能评估证明了所提分类器的有效性。与多种数据集上的现有最优方法相比,所提技术展现出更优的性能。