We propose Partition Trees, a tree-based framework for conditional density estimation over general outcome spaces, supporting both continuous and categorical variables within a unified formulation. Our approach models conditional distributions as piecewise-constant densities on data adaptive partitions and learns trees by directly minimizing conditional negative log-likelihood. This yields a scalable, nonparametric alternative to existing probabilistic trees that does not make parametric assumptions about the target distribution. We further introduce Partition Forests, an ensemble extension obtained by averaging conditional densities. Empirically, we demonstrate improved probabilistic prediction over CART-style trees and competitive or superior performance compared to state-of-the-art probabilistic tree methods and Random Forests, along with robustness to redundant features and heteroscedastic noise.
翻译:我们提出划分树,一种基于树的框架,用于通用结果空间上的条件密度估计,支持连续变量和分类变量在统一框架下的建模。我们的方法将条件分布建模为数据自适应划分上的分段常数密度,并通过直接最小化条件负对数似然来学习树。这提供了一种可扩展的非参数替代方案,用于替代现有的概率树,且无需对目标分布做出参数假设。我们进一步引入划分森林,一种通过平均条件密度获得的集成扩展。实证结果表明,与CART风格树相比,我们的方法在概率预测方面有所改进;与最先进的概率树方法和随机森林相比,表现出竞争性或更优的性能,同时对冗余特征和异方差噪声具有鲁棒性。