Recent works such as REPA have shown that guiding diffusion models with external semantic features (e.g., DINO) can significantly accelerate the training of diffusion transformers (DiTs). However, this requires the use of pretrained external networks, introducing additional dependencies and reducing flexibility. In this work, we argue that DiTs actually have the power to guide the training of themselves, and propose \textbf{Self-Transcendence}, a simple yet effective method that achieves fast convergence using internal feature supervision only. It is found that the slow convergence in DiT training primarily stems from the difficulty of representation learning in shallow layers. To address this, we initially train the DiT model by aligning its shallow features with the latent representations from the pretrained VAE for a short phase (e.g., 40 epochs), then apply classifier-free guidance to the intermediate features, enhancing their discriminative capability and semantic expressiveness. These enriched internal features, learned entirely within the model, are used as supervision signals to guide a new DiT training. Compared to existing self-contained methods, our approach brings a significant performance boost. It can even surpass REPA in terms of generation quality and convergence speed, but without the need for any external pretrained models. Our method is not only more flexible for different backbones but also has the potential to be adopted for a wider range of diffusion-based generative tasks. The source code of our method can be found at https://github.com/csslc/Self-Transcendence.
翻译:近期研究如REPA表明,利用外部语义特征(如DINO)引导扩散模型可显著加速扩散Transformer(DiTs)的训练。然而,该方法需依赖预训练的外部网络,不仅引入额外依赖且降低了灵活性。本文提出DiTs实际上具备自我引导训练的能力,并设计了一种仅通过内部特征监督实现快速收敛的简洁有效方法——\textbf{自我超越}。研究发现,DiT训练收敛缓慢主要源于浅层表征学习的困难。为此,我们首先通过将DiT浅层特征与预训练VAE的潜在表征对齐进行短期训练(如40轮),随后对中间特征施加无分类器引导,以增强其判别能力与语义表达能力。这些完全在模型内部学习得到的丰富特征将作为监督信号,用于引导新一轮DiT训练。与现有自包含方法相比,本方法带来显著的性能提升,甚至在生成质量与收敛速度方面超越REPA,且无需任何外部预训练模型。该方法不仅对不同骨干网络更具灵活性,更有潜力应用于更广泛的基于扩散的生成任务。源代码已发布于https://github.com/csslc/Self-Transcendence。