Learning complex distributions is a fundamental challenge in contemporary applications. Generative models, such as diffusion models, have demonstrated remarkable success in overcoming many limitations of traditional statistical methods. Shen and Meinshausen (2024) introduced engression, a generative approach based on scoring rules that maps noise (and covariates, if available) directly to data. While effective, engression struggles with highly complex distributions, such as those encountered in image data. In this work, we extend engression to improve its capability in learning complex distributions. We propose a framework that defines a general forward process transitioning from the target distribution to a known distribution (e.g., Gaussian) and then learns a reverse Markov process using multiple engression models. This reverse process reconstructs the target distribution step by step. Our approach supports general forward processes, allows for dimension reduction, and naturally discretizes the generative process. As a special case, when using a diffusion-based forward process, our framework offers a method to discretize the training and inference of diffusion models efficiently. Empirical evaluations on simulated and climate data validate our theoretical insights, demonstrating the effectiveness of our approach in capturing complex distributions.
翻译:学习复杂分布是当代应用中的一个基础性挑战。生成模型,例如扩散模型,在克服传统统计方法的诸多局限性方面已展现出显著成效。Shen与Meinshausen(2024)提出了engression,这是一种基于评分规则的生成方法,它将噪声(以及协变量,若可用)直接映射到数据。尽管有效,engression在处理高度复杂的分布(例如图像数据中遇到的分布)时仍面临困难。在本工作中,我们扩展了engression以提升其学习复杂分布的能力。我们提出了一个框架,该框架定义了一个从目标分布过渡到已知分布(例如高斯分布)的通用前向过程,然后使用多个engression模型学习一个逆向马尔可夫过程。此逆向过程逐步重建目标分布。我们的方法支持通用的前向过程,允许进行降维,并能自然地离散化生成过程。作为一个特例,当使用基于扩散的前向过程时,我们的框架提供了一种高效离散化扩散模型训练与推理的方法。在模拟数据和气候数据上的实证评估验证了我们的理论见解,证明了我们的方法在捕捉复杂分布方面的有效性。