Linear discriminant analysis (LDA) is a widely used technique for data classification. The method offers adequate performance in many classification problems, but it becomes inefficient when the data covariance matrix is ill-conditioned. This often occurs when the feature space's dimensionality is higher than or comparable to the training data size. Regularized LDA (RLDA) methods based on regularized linear estimators of the data covariance matrix have been proposed to cope with such a situation. The performance of RLDA methods is well studied, with optimal regularization schemes already proposed. In this paper, we investigate the capability of a positive semidefinite ridge-type estimator of the inverse covariance matrix that coincides with a nonlinear (NL) covariance matrix estimator. The estimator is derived by reformulating the score function of the optimal classifier utilizing linear estimation methods, which eventually results in the proposed NL-RLDA classifier. We derive asymptotic and consistent estimators of the proposed technique's misclassification rate under the assumptions of a double-asymptotic regime and multivariate Gaussian model for the classes. The consistent estimator, coupled with a one-dimensional grid search, is used to set the value of the regularization parameter required for the proposed NL-RLDA classifier. Performance evaluations based on both synthetic and real data demonstrate the effectiveness of the proposed classifier. The proposed technique outperforms state-of-art methods over multiple datasets. When compared to state-of-the-art methods across various datasets, the proposed technique exhibits superior performance.
翻译:线性判别分析(LDA)是一种广泛使用的数据分类技术。该方法在许多分类问题中表现良好,但当数据协方差矩阵呈现病态时,其效率会显著降低。这种情况通常出现在特征空间维度高于或接近训练数据规模时。为解决此类问题,研究者提出了基于数据协方差矩阵正则化线性估计的正则化LDA(RLDA)方法。RLDA方法的性能已得到充分研究,最优正则化方案也已提出。本文探讨了一种与非线性(NL)协方差矩阵估计对应的正半定岭型逆协方差矩阵估计器的潜力。该估计器通过利用线性估计方法重新表述最优分类器的得分函数推导得出,最终形成所提出的NL-RLDA分类器。我们在双渐近机制假设及类间多元高斯模型的条件下,推导了该技术误分类率的渐近一致估计量。该一致估计量结合一维网格搜索,用于设定所提出NL-RLDA分类器所需的正则化参数值。基于合成数据与真实数据的性能评估表明,所提出分类器的有效性。与多个数据集上的现有最优方法相比,该技术展现出更优性能。