We identify and analyze a surprising phenomenon of Latent Diffusion Models (LDMs) where the final steps of the diffusion can degrade sample quality. In contrast to conventional arguments that justify early stopping for numerical stability, this phenomenon is intrinsic to the dimensionality reduction in LDMs. We provide a principled explanation by analyzing the interaction between latent dimension and stopping time. Under a Gaussian framework with linear autoencoders, we characterize the conditions under which early stopping is needed to minimize the distance between generated and target distributions. More precisely, we show that lower-dimensional representations benefit from earlier termination, whereas higher-dimensional latent spaces require later stopping time. We further establish that the latent dimension interplays with other hyperparameters of the problem such as constraints in the parameters of score matching. Experiments on synthetic and real datasets illustrate these properties, underlining that early stopping can improve generative quality. Together, our results offer a theoretical foundation for understanding how the latent dimension influences the sample quality, and highlight stopping time as a key hyperparameter in LDMs.
翻译:我们发现并分析了潜在扩散模型(LDMs)中一个令人惊讶的现象:扩散过程的最后几步可能会降低生成样本的质量。与通常为数值稳定性而提前停止扩散的传统观点不同,这一现象本质上是由于LDMs中的降维操作所导致的。我们通过分析潜在维度与停止时间之间的相互作用,为此现象提供了一个理论解释。在高斯框架与线性自编码器的设定下,我们刻画了为最小化生成分布与目标分布之间距离而需要提前停止的条件。更精确地说,我们证明了较低维度的潜在表示受益于较早的终止,而较高维度的潜在空间则需要较晚的停止时间。我们进一步指出,潜在维度与问题的其他超参数(例如分数匹配中的参数约束)存在相互作用。在合成数据集和真实数据集上的实验验证了这些特性,并强调了提前停止可以提升生成质量。综上所述,我们的研究结果为理解潜在维度如何影响样本质量提供了理论基础,并指出停止时间是LDMs中一个关键的超参数。