Generating samples from a probability distribution is a fundamental task in machine learning and statistics. This article proposes a novel scheme for sampling from a distribution for which the probability density $\mu({\bf x})$ for ${\bf x}\in{\mathbb{R}}^d$ is unknown, but finite independent samples are given. We focus on constructing a Schr\"odinger Bridge (SB) diffusion process on finite horizon $t\in[0,1]$ which induces a probability evolution starting from a fixed point at $t=0$ and ending with the desired target distribution $\mu({\bf x})$ at $t=1$. The diffusion process is characterized by a stochastic differential equation whose drift function can be solely estimated from data samples through a simple one-step procedure. Compared to the classical iterative schemes developed for the SB problem, the methodology of this article is quite simple, efficient, and computationally inexpensive as it does not require the training of neural network and thus circumvents many of the challenges in building the network architecture. The performance of our new generative model is evaluated through a series of numerical experiments on multi-modal low-dimensional simulated data and high-dimensional benchmark image data. Experimental results indicate that the synthetic samples generated from our SB Bridge based algorithm are comparable with the samples generated from the state-of-the-art methods in the field. Our formulation opens up new opportunities for developing efficient diffusion models that can be directly applied to large scale real-world data.
翻译:从概率分布中生成样本是机器学习与统计学中的一项基础任务。针对概率密度$\mu({\bf x})$(其中${\bf x}\in{\mathbb{R}}^d$)未知但有限独立样本已知的分布采样问题,本文提出一种新颖方案。我们专注于在有限时间区间$t\in[0,1]$上构建薛定谔桥(SB)扩散过程,该过程诱导概率从$t=0$时的固定点演化至$t=1$时的目标分布$\mu({\bf x})$。该扩散过程由随机微分方程刻画,其漂移函数可通过简单的单步过程直接从数据样本中估计。与为求解SB问题而开发的经典迭代方案相比,本文方法更加简洁、高效且计算成本低廉——该方法无需训练神经网络,因而规避了构建网络架构时的诸多挑战。通过一系列数值实验(涵盖多模态低维模拟数据与高维基准图像数据)评估了新生成模型的性能。实验结果表明,基于SB桥算法生成的合成样本与当前领域前沿方法生成的样本质量相当。本文的公式化为开发可直接应用于大规模真实数据的高效扩散模型开辟了新途径。