Conditional diffusion models have shown remarkable success in visual content generation, producing high-quality samples across various domains, largely due to classifier-free guidance (CFG). Recent attempts to extend guidance to unconditional models have relied on heuristic techniques, resulting in suboptimal generation quality and unintended effects. In this work, we propose Smoothed Energy Guidance (SEG), a novel training- and condition-free approach that leverages the energy-based perspective of the self-attention mechanism to enhance image generation. By defining the energy of self-attention, we introduce a method to reduce the curvature of the energy landscape of attention and use the output as the unconditional prediction. Practically, we control the curvature of the energy landscape by adjusting the Gaussian kernel parameter while keeping the guidance scale parameter fixed. Additionally, we present a query blurring method that is equivalent to blurring the entire attention weights without incurring quadratic complexity in the number of tokens. In our experiments, SEG achieves a Pareto improvement in both quality and the reduction of side effects. The code is available at https://github.com/SusungHong/SEG-SDXL.
翻译:条件扩散模型在视觉内容生成领域取得了显著成功,能够跨多个领域生成高质量样本,这主要得益于无分类器引导(CFG)技术。近期将引导机制扩展至无条件模型的尝试多依赖于启发式方法,导致生成质量欠佳并产生非预期效应。本文提出平滑能量引导(SEG),一种无需额外训练或条件输入的新方法,其基于自注意力机制的能量视角来增强图像生成。通过定义自注意力的能量,我们引入了一种降低注意力能量景观曲率的方法,并将其输出用作无条件预测。在实际操作中,我们通过调整高斯核参数来控制能量景观的曲率,同时保持引导尺度参数固定。此外,我们提出了一种查询模糊化方法,该方法等效于对整个注意力权重进行模糊化处理,且不会在token数量上引入二次复杂度。实验表明,SEG在生成质量提升与副作用减少方面实现了帕累托改进。代码发布于 https://github.com/SusungHong/SEG-SDXL。