We identify and analyze a surprising phenomenon of Latent Diffusion Models (LDMs) where the final steps of the diffusion can degrade sample quality. In contrast to conventional arguments that justify early stopping for numerical stability, this phenomenon is intrinsic to the dimensionality reduction in LDMs. We provide a principled explanation by analyzing the interaction between latent dimension and stopping time. Under a Gaussian framework with linear autoencoders, we characterize the conditions under which early stopping is needed to minimize the distance between generated and target distributions. More precisely, we show that lower-dimensional representations benefit from earlier termination, whereas higher-dimensional latent spaces require later stopping time. We further establish that the latent dimension interplays with other hyperparameters of the problem such as constraints in the parameters of score matching. Experiments on synthetic and real datasets illustrate these properties, underlining that early stopping can improve generative quality. Together, our results offer a theoretical foundation for understanding how the latent dimension influences the sample quality, and highlight stopping time as a key hyperparameter in LDMs.
翻译:我们发现并分析了潜在扩散模型(LDMs)中一个令人惊讶的现象:扩散过程的最后几步可能会降低样本质量。与为数值稳定性而证明早期停止合理的传统论点不同,这一现象是LDMs中降维过程所固有的。我们通过分析潜在维度与停止时间之间的相互作用,提供了一个原理性的解释。在线性自编码器的高斯框架下,我们刻画了为最小化生成分布与目标分布之间距离而需要早期停止的条件。更精确地说,我们证明了较低维度的表示受益于较早的终止,而较高维度的潜在空间则需要较晚的停止时间。我们进一步指出,潜在维度与该问题的其他超参数(例如分数匹配中的参数约束)存在相互作用。在合成数据集和真实数据集上的实验验证了这些特性,强调了早期停止可以提升生成质量。总之,我们的结果为理解潜在维度如何影响样本质量提供了理论基础,并指出停止时间是LDMs中一个关键的超参数。