We consider the problem of sampling from an unknown distribution for which only a sufficiently large number of training samples are available. Such settings have recently drawn considerable interest in the context of generative modelling. In this paper, we propose a generative model combining diffusion maps and Langevin dynamics. Diffusion maps are used to approximate the drift term from the available training samples, which is then implemented in a discrete-time Langevin sampler to generate new samples. By setting the kernel bandwidth to match the time step size used in the unadjusted Langevin algorithm, our method effectively circumvents any stability issues typically associated with time-stepping stiff stochastic differential equations. More precisely, we introduce a novel split-step scheme, ensuring that the generated samples remain within the convex hull of the training samples. Our framework can be naturally extended to generate conditional samples. We demonstrate the performance of our proposed scheme through experiments on synthetic datasets with increasing dimensions and on a stochastic subgrid-scale parametrization conditional sampling problem.
翻译:我们考虑从仅有足够大量训练样本的未知分布中采样的问题。此类设置近来在生成式建模领域引起了广泛关注。本文提出了一种结合扩散映射与朗之万动力学的生成模型。利用扩散映射从可用训练样本近似漂移项,随后将其应用于离散时间朗之万采样器以生成新样本。通过设定核带宽使其与未调整朗之万算法中使用的步长相匹配,我们的方法有效规避了通常与时间步进刚性随机微分方程相关的数值稳定性问题。具体而言,我们引入了一种新型分裂步方案,确保生成的样本始终位于训练样本凸包内部。该框架可自然扩展至条件样本生成。最终,我们通过不同维度的合成数据集实验以及随机次网格尺度参数化条件采样问题验证了所提方法的性能。