Engression: Extrapolation through the Lens of Distributional Regression

Distributional regression aims to estimate the full conditional distribution of a target variable, given covariates. Popular methods include linear and tree-ensemble based quantile regression. We propose a neural network-based distributional regression methodology called `engression'. An engression model is generative in the sense that we can sample from the fitted conditional distribution and is also suitable for high-dimensional outcomes. Furthermore, we find that modelling the conditional distribution on training data can constrain the fitted function outside of the training support, which offers a new perspective to the challenging extrapolation problem in nonlinear regression. In particular, for `pre-additive noise' models, where noise is added to the covariates before applying a nonlinear transformation, we show that engression can successfully perform extrapolation under some assumptions such as monotonicity, whereas traditional regression approaches such as least-squares or quantile regression fall short under the same assumptions. Our empirical results, from both simulated and real data, validate the effectiveness of the engression method and indicate that the pre-additive noise model is typically suitable for many real-world scenarios. The software implementations of engression are available in both R and Python.

翻译：分布回归旨在估计给定协变量时目标变量的完整条件分布。常用方法包括基于线性模型和树集成模型的分位数回归。我们提出一种基于神经网络的分布回归方法，称为“engression”。engression模型具有生成特性，即可以从拟合的条件分布中采样，同时也适用于高维输出结果。此外，我们发现对训练数据条件分布进行建模能够约束拟合函数在训练支撑集之外的表现，这为非线性回归中具有挑战性的外推问题提供了新的视角。特别地，对于“前加性噪声”模型（即在应用非线性变换前向协变量添加噪声），我们证明在单调性等假设条件下，engression能够成功实现外推，而传统回归方法（如最小二乘法或分位数回归）在相同假设下则存在不足。我们在模拟数据和真实数据上的实证结果验证了engression方法的有效性，并表明前加性噪声模型通常适用于许多现实场景。engression的软件实现已提供R和Python版本。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日