We propose a novel ensemble method called Riemann-Lebesgue Forest (RLF) for regression. The core idea of RLF is to mimic the way how a measurable function can be approximated by partitioning its range into a few intervals. With this idea in mind, we develop a new tree learner named Riemann-Lebesgue Tree which has a chance to split the node from response $Y$ or a direction in feature space $\mathbf{X}$ at each non-terminal node. We generalize the asymptotic performance of RLF under different parameter settings mainly through Hoeffding decomposition \cite{Vaart} and Stein's method \cite{Chen2010NormalAB}. When the underlying function $Y=f(\mathbf{X})$ follows an additive regression model, RLF is consistent with the argument from \cite{Scornet2014ConsistencyOR}. The competitive performance of RLF against original random forest \cite{Breiman2001RandomF} is demonstrated by experiments in simulation data and real world datasets.
翻译:我们提出一种名为黎曼-勒贝格森林(RLF)的新型集成方法,用于回归任务。RLF的核心思想在于模拟可测函数通过将其值域划分为若干区间进行逼近的方式。基于该思想,我们开发了一种名为黎曼-勒贝格树的新型树学习器,该学习器在每个非叶节点处有机会根据响应变量$Y$或特征空间$\mathbf{X}$中的某个方向进行分裂。我们主要通过Hoeffding分解\cite{Vaart}和Stein方法\cite{Chen2010NormalAB},在不同参数设置下推广了RLF的渐近性能。当潜在函数$Y=f(\mathbf{X})$遵循加性回归模型时,RLF与文献\cite{Scornet2014ConsistencyOR}中的论证具有一致性。通过在模拟数据与真实世界数据集上的实验,RLF相较于原始随机森林\cite{Breiman2001RandomF}的竞争性能得到了验证。