Linear Discriminant Analysis (LDA) is an important classification approach. Its simple linear form makes it easy to interpret and it is capable to handle multi-class responses. It is closely related to other classical multivariate statistical techniques, such as Fisher's discriminant analysis, canonical correlation analysis and linear regression. In this paper we strengthen its connection to multivariate response regression by characterizing the explicit relationship between the discriminant directions and the regression coefficient matrix. This key characterization leads to a new regression-based multi-class classification procedure that is flexible enough to deploy any existing structured, regularized, and even non-parametric, regression methods. Moreover, our new formulation is amenable to analysis: we establish a general strategy of analyzing the excess misclassification risk of the proposed classifier for all aforementioned regression techniques. As applications, we provide complete theoretical guarantees for using the widely used $\ell_1$-regularization as well as for using the reduced-rank regression, neither of which has yet been fully analyzed in the LDA context. Our theoretical findings are corroborated by extensive simulation studies and real data analysis.
翻译:线性判别分析(LDA)是一种重要的分类方法。其简单的线性形式使其易于解释,并且能够处理多类响应变量。它与经典多元统计技术(如Fisher判别分析、典型相关分析和线性回归)密切相关。本文通过刻画判别方向与回归系数矩阵之间的显式关系,进一步强化了其与多元响应回归之间的联系。这一关键刻画催生了一种基于回归的新型多类分类流程,该流程足够灵活,能够部署任何现有的结构化、正则化甚至非参数回归方法。此外,我们的新公式便于分析:我们建立了一种通用策略,用于分析上述所有回归技术下所提出分类器的超额误分类风险。作为应用实例,我们为广泛使用的$\ell_1$正则化以及降秩回归提供了完整的理论保证,而这两种方法在LDA背景下尚未得到充分分析。我们的理论发现通过大量模拟研究和真实数据分析得到了验证。