Distribution matching distillation (DMD) aligns a multi-step generator with its few-step counterpart to enable high-quality generation under low inference cost. However, DMD tends to suffer from mode collapse, as its reverse-KL formulation inherently encourages mode-seeking behavior, for which existing remedies typically rely on perceptual or adversarial regularization, thereby incurring substantial computational overhead and training instability. In this work, we propose a role-separated distillation framework that explicitly disentangles the roles of distilled steps: the first step is dedicated to preserving sample diversity via a target-prediction (e.g., v-prediction) objective, while subsequent steps focus on quality refinement under the standard DMD loss, with gradients from the DMD objective blocked at the first step. We term this approach Diversity-Preserved DMD (DP-DMD), which, despite its simplicity -- no perceptual backbone, no discriminator, no auxiliary networks, and no additional ground-truth images -- preserves sample diversity while maintaining visual quality on par with state-of-the-art methods in extensive text-to-image experiments.
翻译:分布匹配蒸馏(DMD)通过将多步生成器与其少步对应物对齐,实现在低推理成本下的高质量生成。然而,DMD往往容易遭受模式崩溃,因为其反向KL公式本质上鼓励模式寻求行为。现有的补救措施通常依赖于感知或对抗正则化,从而带来巨大的计算开销和训练不稳定性。在这项工作中,我们提出了一种角色分离的蒸馏框架,明确解耦了被蒸馏步骤的角色:第一步通过目标预测(例如v-prediction)目标专门用于保持样本多样性,而后续步骤则在标准DMD损失下专注于质量提升,同时阻止DMD目标在第一步产生梯度。我们将这种方法称为多样性保持的DMD(DP-DMD)。尽管其设计简洁——无需感知主干网络、判别器、辅助网络或额外的真实图像——DP-DMD在广泛的文本到图像实验中,在保持视觉质量与最先进方法相当的同时,有效保持了样本多样性。