The tree-based ensembles are known for their outstanding performance in classification and regression problems characterized by feature vectors represented by mixed-type variables from various ranges and domains. However, considering regression problems, they are primarily designed to provide deterministic responses or model the uncertainty of the output with Gaussian or parametric distribution. In this work, we introduce TreeFlow, the tree-based approach that combines the benefits of using tree ensembles with the capabilities of modeling flexible probability distributions using normalizing flows. The main idea of the solution is to use a tree-based model as a feature extractor and combine it with a conditional variant of normalizing flow. Consequently, our approach is capable of modeling complex distributions for the regression outputs. We evaluate the proposed method on challenging regression benchmarks with varying volume, feature characteristics, and target dimensionality. We obtain the SOTA results for both probabilistic and deterministic metrics on datasets with multi-modal target distributions and competitive results on unimodal ones compared to tree-based regression baselines.
翻译:基于树的集成方法在处理包含混合类型变量(来自不同范围和域)的特征向量时,在分类和回归问题中表现出色。然而,在回归问题中,它们主要被设计为提供确定性响应,或通过高斯分布或参数分布对输出的不确定性进行建模。本研究提出了TreeFlow,一种基于树的方法,它结合了树集成方法的优势与使用归一化流建模灵活概率分布的能力。该方案的核心思想是利用基于树的模型作为特征提取器,并将其与条件归一化流变体相结合。因此,我们的方法能够为回归输出建模复杂分布。我们在具有不同数据量、特征特征和目标维度的挑战性回归基准上评估了所提出的方法。在具有多模态目标分布的数据集上,我们在概率性和确定性指标上均取得了当前最优结果;在单模态目标分布数据集上,与基于树的回归基线相比,获得了具有竞争力的结果。