Diffusion models have shown remarkable success in text-to-image generation, making alignment methods for these models increasingly important. A key challenge is the sparsity of preference labels, which are typically available only at the terminal of denoising trajectories. This raises the issue of how to assign credit across denoising steps based on these sparse labels. In this paper, we propose Denoised Distribution Estimation (DDE), a novel method for credit assignment. Unlike previous approaches that rely on auxiliary models or hand-crafted schemes, DDE derives its strategy more explicitly. The proposed DDE directly estimates the terminal denoised distribution from the perspective of each step. It is equipped with two estimation strategies and capable of representing the entire denoising trajectory with a single model inference. Theoretically and empirically, we show that DDE prioritizes optimizing the middle part of the denoising trajectory, resulting in a novel and effective credit assignment scheme. Extensive experiments demonstrate that our approach achieves superior performance, both quantitatively and qualitatively.
翻译:扩散模型在文本到图像生成领域取得了显著成功,使得针对这些模型的对齐方法日益重要。一个关键挑战在于偏好标签的稀疏性,这些标签通常仅在去噪轨迹的末端可用。这引发了如何基于这些稀疏标签在去噪步骤间分配贡献度的问题。本文提出去噪分布估计(DDE),一种新颖的贡献度分配方法。与先前依赖辅助模型或人工设计方案的方案不同,DDE的策略推导更为显式。所提出的DDE直接从每个步骤的视角直接估计末端去噪分布。该方法配备两种估计策略,并能够通过单次模型推理表示整个去噪轨迹。从理论和实验上,我们证明DDE优先优化去噪轨迹的中间部分,从而形成一种新颖且有效的贡献度分配方案。大量实验表明,我们的方法在定量和定性评估中均实现了卓越性能。