This paper investigates a new approach to estimate the gradient of the conditional probability given the covariates in the binary classification framework. The proposed approach consists of fitting a localized nearest-neighbor logistic model with $\ell_1$-penalty in order to cope with possibly high-dimensional covariates. Our theoretical analysis shows that the pointwise convergence rate of the gradient estimator is optimal under very mild assumptions. Moreover, using an outer product of such gradient estimates at several points in the covariate space, we provide a new method for estimating the central subspace, a well-known object allowing to carry out dimension reduction within the covariate space. Our implementation uses cross-validation on the misclassification rate to estimate the dimension of this subspace. We find that the proposed approach outperforms existing competitors in synthetic and real data applications.
翻译:本文研究了一种在二元分类框架下,基于给定协变量估计条件概率梯度新方法。该方法通过拟合局部近邻Logistic模型并施加ℓ₁惩罚,以应对协变量可能具有高维性的挑战。理论分析表明,在极弱假设条件下,梯度估计量的逐点收敛速度达到最优。此外,通过利用协变量空间中若干点处梯度估计的外积,我们提出了一种用于估计中心子空间的新方法——中心子空间是协变量空间内实现降维的经典工具。实现过程中,我们采用交叉验证基于误分类率估计该子空间的维度。实验表明,在合成数据与真实数据应用中,所提方法优于现有竞争方法。