ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improves the training stability of UNet. Experimental results on four famous datasets show that our methods are superior to stabilize training and yield about 1.5x training acceleration on different diffusion models with UNet or UViT backbones. Code: https://github.com/sail-sg/ScaleLong

翻译：在扩散模型中，UNet因其通过长跳跃连接（LSCs）连接遥远网络模块以聚合长距离信息并缓解梯度消失，成为最流行的网络骨干。然而，UNet在扩散模型中常遭遇训练不稳定问题，而通过缩小其LSC系数可缓解此现象。但目前尚缺乏对UNet在扩散模型中不稳定性及LSC缩放性能提升的理论理解。为解决该问题，我们从理论上揭示了UNet中LSC系数对前向/反向传播稳定性和鲁棒性具有重大影响。具体而言，UNet任意层的隐藏特征和梯度可能发生振荡，且其振荡范围实际上较大，这解释了UNet训练不稳定的原因。此外，UNet对扰动输入具有可证明的敏感性，会输出偏离期望结果的值，导致损失振荡进而引发梯度振荡。同时，我们观察到LSC系数缩放对UNet隐藏特征稳定性、梯度稳定性及鲁棒性的理论优势。最终，受理论启发，我们提出高效系数缩放框架ScaleLong，通过缩放UNet中LSC系数更好地提升训练稳定性。在四个著名数据集上的实验表明，我们的方法在稳定训练方面表现优越，可在不同采用UNet或UViT骨干的扩散模型上实现约1.5倍训练加速。代码：https://github.com/sail-sg/ScaleLong