Generative modeling aims to generate new data samples that resemble a given dataset. When using diffusion models for this task, one of the main challenges is solving the problem in the input space, which tends to be very high-dimensional. To address this, recent approaches solve diffusion models in the latent space through an encoder that maps from the data space to a lower-dimensional latent space, improving training efficiency and achieving state-of-the-art results. The variational autoencoder (VAE) is the most commonly used encoder/decoder framework in this domain, known for its ability to learn latent representations and generate data samples. In this paper, we introduce a novel encoder/decoder framework with theoretical properties distinct from those of the VAE, specifically designed to preserve the geometric structure of the data distribution. We demonstrate the significant advantages of this geometry-preserving encoder in the training process of both the encoder and decoder. Additionally, we provide theoretical results proving convergence of the training process, including convergence guarantees for encoder training, and results showing faster convergence of decoder training when using the geometry-preserving encoder.
翻译:生成建模旨在生成与给定数据集相似的新数据样本。使用扩散模型完成此任务时,主要挑战之一是在输入空间中解决问题,而该空间往往维度极高。为解决这一问题,近期方法通过编码器将数据空间映射到更低维的潜在空间,从而在潜在空间中求解扩散模型,提升了训练效率并取得了最优结果。变分自编码器是该领域最常用的编码器/解码器框架,以其学习潜在表示和生成数据样本的能力而闻名。本文提出了一种新颖的编码器/解码器框架,其理论性质区别于变分自编码器,专门设计用于保持数据分布的几何结构。我们证明了这种保持几何结构的编码器在编码器和解码器训练过程中具有显著优势。此外,我们提供了理论结果,证明了训练过程的收敛性,包括编码器训练的收敛保证,以及使用此保持几何结构的编码器时解码器训练收敛速度加快的结论。