During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that users prone to gradually edit the input image. This motivates us to cache and reuse the feature maps of the original image. Given an edited image, we sparsely apply the convolutional filters to the edited regions while reusing the cached features for the unedited areas. Based on our algorithm, we further propose Sparse Incremental Generative Engine (SIGE) to convert the computation reduction to latency reduction on off-the-shelf hardware. With about $1\%$-area edits, SIGE accelerates DDPM by $3.0\times$ on NVIDIA RTX 3090 and $4.6\times$ on Apple M1 Pro GPU, Stable Diffusion by $7.2\times$ on 3090, and GauGAN by $5.6\times$ on 3090 and $5.2\times$ on M1 Pro GPU. Compared to our conference version, we extend SIGE to accommodate attention layers and apply it to Stable Diffusion. Additionally, we offer support for Apple M1 Pro GPU and include more results with large and sequential edits.
翻译:在图像编辑过程中,现有深度生成模型倾向于从零开始重新合成整个输出,包括未编辑区域。这会导致显著的计算浪费,尤其是在进行微小编辑操作时。本文提出空间稀疏推理(SSI)——一种通用技术,它能够选择性地对编辑区域执行计算,并加速包括条件GAN和扩散模型在内的多种生成模型。我们的关键发现是:用户倾向于逐步编辑输入图像。这一观察启发我们缓存并复用原始图像的特征图。针对编辑后的图像,我们稀疏地将卷积滤波器应用于编辑区域,同时复用未编辑区域的缓存特征。基于该算法,我们进一步提出稀疏增量生成引擎(SIGE),将计算量减少转化为现有硬件上的延迟降低。对于约1%面积的编辑任务,SIGE在NVIDIA RTX 3090上加速DDPM达3.0倍,在Apple M1 Pro GPU上加速4.6倍;在3090上加速Stable Diffusion达7.2倍;在3090和M1 Pro GPU上加速GauGAN分别达5.6倍和5.2倍。相较于会议版本,我们扩展了SIGE以支持注意力层,并将其应用于Stable Diffusion。此外,我们还提供了对Apple M1 Pro GPU的支持,并补充了大规模及序列编辑场景下的更多实验结果。