Linear Discriminant Analysis (LDA) is an important classification approach. Its simple linear form makes it easy to interpret and it is capable to handle multi-class responses. It is closely related to other classical multivariate statistical techniques, such as Fisher's discriminant analysis, canonical correlation analysis and linear regression. In this paper we strengthen its connection to multivariate response regression by characterizing the explicit relationship between the discriminant directions and the regression coefficient matrix. This key characterization leads to a new regression-based multi-class classification procedure that is flexible enough to deploy any existing structured, regularized, and even non-parametric, regression methods. Moreover, our new formulation is generically easy to analyze compared to existing regression-based LDA procedures. In particular, we provide complete theoretical guarantees for using the widely used $\ell_1$-regularization that has not yet been fully analyzed in the LDA context. Our theoretical findings are corroborated by extensive simulation studies and real data analysis.
翻译:线性判别分析是一种重要的分类方法。其线性形式简洁直观、易于解释,且能有效处理多类响应问题。该方法与其他经典多元统计技术(如Fisher判别分析、典型相关分析和线性回归)密切相关。本文通过刻画判别方向与回归系数矩阵之间的显式关系,进一步强化了线性判别分析与多元响应回归之间的联系。这一关键刻画催生了一种新的基于回归的多类分类框架,该框架具有充分灵活性,可部署任何现有的结构化、正则化甚至非参数回归方法。此外,相较于现有基于回归的线性判别分析方法,本文新提出的框架在理论上更易于分析。我们特别针对广泛使用但在线性判别分析背景下尚未被充分分析的ℓ₁正则化方法,提供了完整的理论保证。通过大量模拟实验和真实数据分析,本文的理论发现得到了充分验证。