We propose a computationally efficient algorithm for gradient-based linear dimension reduction and high-dimensional regression. The algorithm initially computes a Mondrian forest and uses this estimator to identify a relevant feature subspace of the inputs from an estimate of the expected gradient outer product (EGOP) of the regression function. In addition, we introduce an iterative approach known as Transformed Iterative Mondrian (TrIM) forest to improve the Mondrian forest estimator by using the EGOP estimate to update the set of features and weights used by the Mondrian partitioning mechanism. We obtain consistency guarantees and convergence rates for estimating the EGOP matrix and the random forest estimator obtained from one iteration of the TrIM algorithm. Lastly, we demonstrate the effectiveness of our proposed algorithm for learning the relevant feature subspace across various settings with both simulated and real data.
翻译:我们提出了一种计算高效的算法,用于基于梯度的线性降维和高维回归。该算法首先计算一个蒙特利安森林,并利用该估计器从回归函数的期望梯度外积(EGOP)估计中识别输入数据的关键特征子空间。此外,我们引入了一种称为变换迭代蒙特利安(TrIM)森林的迭代方法,通过使用EGOP估计更新蒙特利安划分机制所用的特征集和权重,从而改进蒙特利安森林估计器。我们获得了EGOP矩阵估计以及TrIM算法单次迭代所得随机森林估计的一致性保证和收敛速率。最后,我们通过模拟和真实数据在不同设置下的实验,证明了所提算法在学习关键特征子空间方面的有效性。