Linear Discriminant Analysis (LDA) is a fundamental method for classification. Its simple linear structure facilitates interpretation, and it is naturally suited to multi-class settings. LDA is also closely connected to several classical multivariate techniques, including Fisher's discriminant analysis, canonical correlation analysis, and linear regression. In this paper, we strengthen the connection between LDA and multivariate response regression by establishing an explicit relationship between discriminant directions and regression coefficients. This characterization yields a new regression-based framework for multi-class classification that accommodates structured, regularized, and even non-parametric regression methods. In contrast to existing regression-based approaches, our formulation is particularly amenable to theoretical analysis: we develop a general strategy for deriving bounds on the excess misclassification risk of the proposed classifier across all such regression procedures. As concrete applications, we provide complete theoretical guarantees for two widely used methods -- $\ell_1$-regularization and reduced-rank regression -- neither of which has previously been fully analyzed in the LDA context. The theoretical results are supported by extensive simulation studies and empirical evaluations on real data.
翻译:线性判别分析(LDA)是一种基础的分类方法。其简单的线性结构便于解释,并且天然适用于多类场景。LDA还与多种经典多变量技术紧密相关,包括费希尔判别分析、典型相关分析和线性回归。本文通过建立判别方向与回归系数之间的显式关系,加强了LDA与多变量响应回归之间的联系。这一特性催生了一种基于回归的多类分类新框架,该框架可容纳结构化、正则化乃至非参数回归方法。与现有基于回归的方法相比,我们的公式特别适合理论分析:我们发展了一种通用策略,用于推导所提分类器在所有此类回归过程中的超额误分类风险界限。作为具体应用,我们为两种广泛使用的方法——ℓ1正则化和降秩回归——提供了完整的理论保证,而这两种方法此前在LDA背景下均未被充分分析。理论结果得到了大量模拟研究和真实数据实证评估的支持。