Stochastic Interpolants (SI) is a powerful framework for generative modeling, capable of flexibly transforming between two probability distributions. However, its use in jointly optimized latent variable models remains unexplored as it requires direct access to the samples from the two distributions. This work presents Latent Stochastic Interpolants (LSI) enabling joint learning in a latent space with end-to-end optimized encoder, decoder and latent SI models. We achieve this by developing a principled Evidence Lower Bound (ELBO) objective derived directly in continuous time. The joint optimization allows LSI to learn effective latent representations along with a generative process that transforms an arbitrary prior distribution into the encoder-defined aggregated posterior. LSI sidesteps the simple priors of the normal diffusion models and mitigates the computational demands of applying SI directly in high-dimensional observation spaces, while preserving the generative flexibility of the SI framework. We demonstrate the efficacy of LSI through comprehensive experiments on the standard large scale ImageNet generation benchmark.
翻译:随机插值方法(SI)是一种强大的生成式建模框架,能够灵活地在两个概率分布之间进行转换。然而,由于该方法需要直接访问两个分布的样本,其在联合优化的潜变量模型中的应用尚未得到探索。本文提出潜变量随机插值方法(LSI),该方法能够在潜空间中实现端到端联合优化编码器、解码器及潜变量SI模型。我们通过构建直接源于连续时间的标准化证据下界(ELBO)目标函数来实现这一目标。联合优化使得LSI能够学习有效的潜变量表征,同时将任意先验分布转换为由编码器定义的聚合后验分布的生成过程。LSI规避了标准扩散模型的简单先验假设,缓解了在高维观测空间中直接应用SI所带来的计算负担,同时保留了SI框架的生成灵活性。我们在标准大规模ImageNet生成基准上进行了全面的实验,验证了LSI的有效性。