In recent years there has been increased interest in understanding the interplay between deep generative models (DGMs) and the manifold hypothesis. Research in this area focuses on understanding the reasons why commonly-used DGMs succeed or fail at learning distributions supported on unknown low-dimensional manifolds, as well as developing new models explicitly designed to account for manifold-supported data. This manifold lens provides both clarity as to why some DGMs (e.g. diffusion models and some generative adversarial networks) empirically surpass others (e.g. likelihood-based models such as variational autoencoders, normalizing flows, or energy-based models) at sample generation, and guidance for devising more performant DGMs. We carry out the first survey of DGMs viewed through this lens, making two novel contributions along the way. First, we formally establish that numerical instability of likelihoods in high ambient dimensions is unavoidable when modelling data with low intrinsic dimension. We then show that DGMs on learned representations of autoencoders can be interpreted as approximately minimizing Wasserstein distance: this result, which applies to latent diffusion models, helps justify their outstanding empirical results. The manifold lens provides a rich perspective from which to understand DGMs, and we aim to make this perspective more accessible and widespread.
翻译:近年来,人们对理解深度生成模型(DGMs)与流形假设之间的相互作用日益关注。该领域的研究聚焦于探究常用DGMs在支持未知低维流形分布的学习中成功或失败的原因,并开发专门针对流形支撑数据设计的新模型。这种流形视角既阐明了为何某些DGMs(如扩散模型和部分生成对抗网络)在样本生成方面经验性地超越其他模型(如基于似然的变分自编码器、标准化流或基于能量的模型),也为设计更高性能的DGMs提供了指导。我们首次通过该视角对DGMs进行系统性综述,并在此过程中提出两项创新贡献。首先,我们严格证明了当建模数据具有低本征维度时,高环境维度下似然数值不稳定性是不可避免的。随后我们证明,在自编码器学习表示上的DGMs可被解释为近似最小化Wasserstein距离:这一适用于潜在扩散模型的结果,为其卓越的经验性能提供了理论依据。流形视角为理解DGMs提供了丰富的理论框架,我们致力于使这一视角更易于理解并得到更广泛的应用。