TreeFlow: Going beyond Tree-based Gaussian Probabilistic Regression

The tree-based ensembles are known for their outstanding performance in classification and regression problems characterized by feature vectors represented by mixed-type variables from various ranges and domains. However, considering regression problems, they are primarily designed to provide deterministic responses or model the uncertainty of the output with Gaussian or parametric distribution. In this work, we introduce TreeFlow, the tree-based approach that combines the benefits of using tree ensembles with the capabilities of modeling flexible probability distributions using normalizing flows. The main idea of the solution is to use a tree-based model as a feature extractor and combine it with a conditional variant of normalizing flow. Consequently, our approach is capable of modeling complex distributions for the regression outputs. We evaluate the proposed method on challenging regression benchmarks with varying volume, feature characteristics, and target dimensionality. We obtain the SOTA results for both probabilistic and deterministic metrics on datasets with multi-modal target distributions and competitive results on unimodal ones compared to tree-based regression baselines.

翻译：基于树的集成方法在处理包含混合类型变量（来自不同范围和域）的特征向量时，在分类和回归问题中表现出色。然而，在回归问题中，它们主要被设计为提供确定性响应，或通过高斯分布或参数分布对输出的不确定性进行建模。本研究提出了TreeFlow，一种基于树的方法，它结合了树集成方法的优势与使用归一化流建模灵活概率分布的能力。该方案的核心思想是利用基于树的模型作为特征提取器，并将其与条件归一化流变体相结合。因此，我们的方法能够为回归输出建模复杂分布。我们在具有不同数据量、特征特征和目标维度的挑战性回归基准上评估了所提出的方法。在具有多模态目标分布的数据集上，我们在概率性和确定性指标上均取得了当前最优结果；在单模态目标分布数据集上，与基于树的回归基线相比，获得了具有竞争力的结果。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/