Sufficient dimension reduction has received much interest over the past 30 years. Most existing approaches focus on statistical models linking the response to the covariate through a regression equation, and as such are not adapted to binary classification problems. We address the question of dimension reduction for binary classification by fitting a localized nearest-neighbor logistic model with $\ell_1$-penalty in order to estimate the gradient of the conditional probability of interest. Our theoretical analysis shows that the pointwise convergence rate of the gradient estimator is optimal under very mild conditions. The dimension reduction subspace is estimated using an outer product of such gradient estimates at several points in the covariate space. Our implementation uses cross-validation on the misclassification rate to estimate the dimension of this subspace. We find that the proposed approach outperforms existing competitors in synthetic and real data applications.
翻译:充分降维在过去三十年间引起了广泛关注。现有方法大多侧重于通过回归方程将响应变量与协变量联系起来的统计模型,因此不适用于二元分类问题。我们通过拟合具有$\ell_1$惩罚项的局部最近邻逻辑模型来估计目标条件概率的梯度,从而解决二元分类中的降维问题。理论分析表明,在非常温和的条件下,梯度估计量的逐点收敛速度是最优的。降维子空间的估计是通过对协变量空间中多个点的此类梯度估计进行外积运算实现的。我们的实现方法使用误分类率的交叉验证来估计该子空间的维度。研究发现,在合成数据和实际数据应用中,所提出的方法优于现有的竞争方法。