Generative models have recently undergone significant advancement due to the diffusion models. The success of these models can be often attributed to their use of guidance techniques, such as classifier or classifier-free guidance, which provide effective mechanisms to trade-off between fidelity and diversity. However, these methods are not capable of guiding a generated image to be aware of its geometric configuration, e.g., depth, which hinders their application to areas that require a certain level of depth awareness. To address this limitation, we propose a novel guidance method for diffusion models that uses estimated depth information derived from the rich intermediate representations of diffusion models. We first present label-efficient depth estimation framework using internal representations of diffusion models. Subsequently, we propose the incorporation of two guidance techniques based on pseudo-labeling and depth-domain diffusion prior during the sampling phase to self-condition the generated image using the estimated depth map. Experiments and comprehensive ablation studies demonstrate the effectiveness of our method in guiding the diffusion models towards the generation of geometrically plausible images.
翻译:生成模型近年来因扩散模型的发展取得了显著进展。这些模型成功的关键常归因于其采用的引导技术(如分类器引导或无分类器引导),这些技术为平衡保真度与多样性提供了有效机制。然而,现有方法无法使生成图像感知其几何结构(例如深度信息),这限制了其在需要特定深度感知领域的应用。为解决这一局限,我们提出了一种利用扩散模型丰富中间表征中估计深度信息的新型引导方法。首先,我们构建了基于扩散模型内部表征的标签高效深度估计框架;其次,在采样阶段引入基于伪标签和深度域扩散先验的两种引导技术,通过估计深度图对生成图像进行自调节。实验与消融研究表明,本方法能有效引导扩散模型生成几何合理的图像。