Diffusion models (DMs) have become the leading choice for generative tasks across diverse domains. However, their reliance on multiple sequential forward passes significantly limits real-time performance. Previous acceleration methods have primarily focused on reducing the number of sampling steps or reusing intermediate results, failing to leverage variations across spatial regions within the image due to the constraints of convolutional U-Net structures. By harnessing the flexibility of Diffusion Transformers (DiTs) in handling variable number of tokens, we introduce RAS, a novel, training-free sampling strategy that dynamically assigns different sampling ratios to regions within an image based on the focus of the DiT model. Our key observation is that during each sampling step, the model concentrates on semantically meaningful regions, and these areas of focus exhibit strong continuity across consecutive steps. Leveraging this insight, RAS updates only the regions currently in focus, while other regions are updated using cached noise from the previous step. The model's focus is determined based on the output from the preceding step, capitalizing on the temporal consistency we observed. We evaluate RAS on Stable Diffusion 3 and Lumina-Next-T2I, achieving speedups up to 2.36x and 2.51x, respectively, with minimal degradation in generation quality. Additionally, a user study reveals that RAS delivers comparable qualities under human evaluation while achieving a 1.6x speedup. Our approach makes a significant step towards more efficient diffusion transformers, enhancing their potential for real-time applications.
翻译:扩散模型已成为跨不同领域生成任务的主要选择。然而,其依赖多次顺序前向传递的特性严重限制了实时性能。先前的加速方法主要集中在减少采样步骤数或重用中间结果,但由于卷积U-Net结构的限制,未能利用图像内部的空间区域差异。通过利用扩散变压器在处理可变数量标记方面的灵活性,我们引入RAS,一种新颖的免训练采样策略,该策略根据DiT模型的关注点动态地为图像内区域分配不同的采样比率。我们的关键观察是,在每个采样步骤中,模型集中关注语义上有意义的区域,并且这些关注区域在连续步骤间表现出强连续性。基于这一洞察,RAS仅更新当前关注区域,而其他区域则使用上一步的缓存噪声进行更新。模型关注点基于前一步的输出确定,充分利用了我们观察到的时间一致性。我们在Stable Diffusion 3和Lumina-Next-T2I上评估RAS,分别实现了高达2.36倍和2.51倍的加速比,同时生成质量损失极小。此外,用户研究表明,RAS在人类评估下提供可比质量,同时实现1.6倍加速。我们的方法为更高效的扩散变压器迈出了重要一步,增强了其在实时应用中的潜力。