Distributional regression aims to estimate the full conditional distribution of a target variable, given covariates. Popular methods include linear and tree-ensemble based quantile regression. We propose a neural network-based distributional regression methodology called `engression'. An engression model is generative in the sense that we can sample from the fitted conditional distribution and is also suitable for high-dimensional outcomes. Furthermore, we find that modelling the conditional distribution on training data can constrain the fitted function outside of the training support, which offers a new perspective to the challenging extrapolation problem in nonlinear regression. In particular, for `pre-additive noise' models, where noise is added to the covariates before applying a nonlinear transformation, we show that engression can successfully perform extrapolation under some assumptions such as monotonicity, whereas traditional regression approaches such as least-squares or quantile regression fall short under the same assumptions. Our empirical results, from both simulated and real data, validate the effectiveness of the engression method and indicate that the pre-additive noise model is typically suitable for many real-world scenarios. The software implementations of engression are available in both R and Python.
翻译:分布回归旨在估计给定协变量时目标变量的完整条件分布。常用方法包括基于线性模型和树集成模型的分位数回归。我们提出一种基于神经网络的分布回归方法,称为“engression”。engression模型具有生成特性,即可以从拟合的条件分布中采样,同时也适用于高维输出结果。此外,我们发现对训练数据条件分布进行建模能够约束拟合函数在训练支撑集之外的表现,这为非线性回归中具有挑战性的外推问题提供了新的视角。特别地,对于“前加性噪声”模型(即在应用非线性变换前向协变量添加噪声),我们证明在单调性等假设条件下,engression能够成功实现外推,而传统回归方法(如最小二乘法或分位数回归)在相同假设下则存在不足。我们在模拟数据和真实数据上的实证结果验证了engression方法的有效性,并表明前加性噪声模型通常适用于许多现实场景。engression的软件实现已提供R和Python版本。