Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive latency for interactive applications. In this paper, we propose DistriFusion to tackle this problem by leveraging parallelism across multiple GPUs. Our method splits the model input into multiple patches and assigns each patch to a GPU. However, na\"{\i}vely implementing such an algorithm breaks the interaction between patches and loses fidelity, while incorporating such an interaction will incur tremendous communication overhead. To overcome this dilemma, we observe the high similarity between the input from adjacent diffusion steps and propose displaced patch parallelism, which takes advantage of the sequential nature of the diffusion process by reusing the pre-computed feature maps from the previous timestep to provide context for the current step. Therefore, our method supports asynchronous communication, which can be pipelined by computation. Extensive experiments show that our method can be applied to recent Stable Diffusion XL with no quality degradation and achieve up to a 6.1$\times$ speedup on eight NVIDIA A100s compared to one. Our code is publicly available at https://github.com/mit-han-lab/distrifuser.
翻译:扩散模型在合成高质量图像方面取得了巨大成功。然而,由于巨大的计算成本,利用扩散模型生成高分辨率图像仍然具有挑战性,这导致了交互式应用中不可接受的延迟。本文提出DistriFusion方法,通过利用多GPU的并行性来解决这一问题。我们的方法将模型输入划分为多个补丁(patch),并将每个补丁分配给一个GPU。然而,简单实现此类算法会破坏补丁之间的交互并导致保真度损失,而引入这种交互则会产生巨大的通信开销。为克服这一困境,我们观察到相邻扩散步骤的输入之间具有高度相似性,并提出移位补丁并行(displaced patch parallelism)方法,该方法通过重用先前时间步预计算的特征图为当前步提供上下文,从而利用扩散过程的顺序特性。因此,我们的方法支持异步通信,可与计算流水线化。大量实验表明,我们的方法可应用于最新的Stable Diffusion XL,且无质量下降,在八块NVIDIA A100上相较于单卡可实现高达6.1倍的加速。我们的代码已开源:https://github.com/mit-han-lab/distrifuser。