This paper documents our characterization study and practices for serving text-to-image requests with stable diffusion models in production. We first comprehensively analyze inference request traces for commercial text-to-image applications. It commences with our observation that add-on modules, i.e., ControlNets and LoRAs, that augment the base stable diffusion models, are ubiquitous in generating images for commercial applications. Despite their efficacy, these add-on modules incur high loading overhead, prolong the serving latency, and swallow up expensive GPU resources. Driven by our characterization study, we present SwiftDiffusion, a system that efficiently generates high-quality images using stable diffusion models and add-on modules. To achieve this, SwiftDiffusion reconstructs the existing text-to-image serving workflow by identifying the opportunities for parallel computation and distributing ControlNet computations across multiple GPUs. Further, SwiftDiffusion thoroughly analyzes the dynamics of image generation and develops techniques to eliminate the overhead associated with LoRA loading and patching while preserving the image quality. Last, SwiftDiffusion proposes specialized optimizations in the backbone architecture of the stable diffusion models, which are also compatible with the efficient serving of add-on modules. Compared to state-of-the-art text-to-image serving systems, SwiftDiffusion reduces serving latency by up to 5x and improves serving throughput by up to 2x without compromising image quality.
翻译:本文记录了我们在生产环境中使用稳定扩散模型处理文生图请求的特性研究与实践。我们首先全面分析了商业文生图应用的推理请求轨迹。研究始于我们观察到,用于增强基础稳定扩散模型的附加模块(如ControlNets和LoRAs)在商业应用的图像生成中普遍存在。尽管这些附加模块效果显著,但它们会带来高昂的加载开销、延长服务延迟并占用昂贵的GPU资源。基于特性研究的驱动,我们提出了SwiftDiffusion系统,该系统能够利用稳定扩散模型及附加模块高效生成高质量图像。为实现这一目标,SwiftDiffusion重构了现有的文生图服务流程,通过识别并行计算机会并将ControlNet计算分布到多个GPU上。此外,SwiftDiffusion深入分析了图像生成的动态特性,开发了在保持图像质量的同时消除LoRA加载与修补开销的技术。最后,SwiftDiffusion在稳定扩散模型的主干架构中提出了专项优化方案,这些优化同样兼容附加模块的高效服务。与最先进的文生图服务系统相比,SwiftDiffusion在不损失图像质量的前提下,将服务延迟降低至最高5倍,并将服务吞吐量提升至最高2倍。