We propose a computationally efficient algorithm for gradient-based linear dimension reduction and high-dimensional regression. The algorithm initially computes a Mondrian forest and uses this estimator to identify a relevant feature subspace of the inputs from an estimate of the expected gradient outer product (EGOP) of the regression function. In addition, we introduce an iterative approach known as Transformed Iterative Mondrian (TrIM) forest to improve the Mondrian forest estimator by using the EGOP estimate to update the set of features and weights used by the Mondrian partitioning mechanism. We obtain consistency guarantees and convergence rates for the estimation of the EGOP matrix and the random forest estimator obtained from one iteration of the TrIM algorithm. Lastly, we demonstrate the effectiveness of our proposed algorithm for learning the relevant feature subspace across a variety of settings with both simulated and real data.
翻译:本文提出一种计算高效的梯度线性降维与高维回归算法。该算法首先生成蒙德里安森林,并利用该估计器从回归函数的期望梯度外积估计中识别输入的相关特征子空间。此外,我们引入一种称为变换迭代蒙德里安森林的迭代方法,通过利用EGOP估计更新蒙德里安分割机制所使用的特征集与权重,以改进蒙德里安森林估计器。我们为EGOP矩阵的估计以及通过TrIM算法单次迭代得到的随机森林估计器建立了相合性保证与收敛速率。最后,我们通过模拟数据与真实数据在多类场景下验证了所提算法在学习相关特征子空间方面的有效性。