Consistency models have been proposed for fast generative modeling, achieving results competitive with diffusion and flow models. However, these methods exhibit inherent instability and limited reproducibility when training from scratch, motivating subsequent work to explain and stabilize these issues. While these efforts have provided valuable insights, the explanations remain fragmented, and the theoretical relationships remain unclear. In this work, we provide a theoretical examination of consistency models by analyzing them from a flow map-based perspective. This joint analysis clarifies how training stability and convergence behavior can give rise to degenerate solutions. Building on these insights, we revisit self-distillation as a practical remedy for certain forms of suboptimal convergence and reformulate it to avoid excessive gradient norms for stable optimization. We further demonstrate that our strategy extends beyond image generation to diffusion-based policy learning, without reliance on a pretrained diffusion model for initialization, thereby illustrating its broader applicability.
翻译:一致性模型被提出用于快速生成建模,其性能可与扩散模型和流模型相媲美。然而,这些方法在从头开始训练时表现出固有的不稳定性和有限的可复现性,这促使后续工作致力于解释并稳定这些问题。尽管这些努力提供了宝贵的见解,但相关解释仍较为零散,且理论关系尚不明确。在本工作中,我们通过基于流映射的视角分析一致性模型,对其进行了理论考察。这一联合分析阐明了训练稳定性和收敛行为如何导致退化解的产生。基于这些见解,我们重新审视了自蒸馏方法,将其作为应对特定形式次优收敛的实用补救措施,并对其进行了重新表述以避免过大的梯度范值,从而实现稳定优化。我们进一步证明,该策略可扩展至基于扩散的策略学习领域,且无需依赖预训练的扩散模型进行初始化,从而展示了其更广泛的适用性。