Text-to-image generation executes a diffusion workflow comprising multiple models centered on a base diffusion model. Existing serving systems treat each workflow as an opaque monolith, provisioning, placing, and scaling all constituent models together, which obscures internal dataflow, prevents model sharing, and enforces coarse-grained resource management. In this paper, we make a case for micro-serving diffusion workflows with LegoDiffusion, a system that decomposes a workflow into loosely coupled model-execution nodes that can be independently managed and scheduled. By explicitly managing individual model inference, LegoDiffusion unlocks cluster-scale optimizations, including per-model scaling, model sharing, and adaptive model parallelism. Collectively, LegoDiffusion outperforms existing diffusion workflow serving systems, sustaining up to 3x higher request rates and tolerating up to 8x higher burst traffic.
翻译:文本到图像生成执行一个扩散工作流,该工作流包含以基础扩散模型为中心的多个模型。现有服务系统将每个工作流视为不透明的整体,共同调配、部署和扩展所有组成模型,这掩盖了内部数据流、阻碍模型共享并导致粗粒度的资源管理。在本文中,我们提出使用LegoDiffusion对扩散工作流进行微服务化,该方案将工作流解耦为可独立管理和调度的松散耦合模型执行节点。通过显式管理单个模型推理,LegoDiffusion实现了集群级优化,包括单模型弹性扩缩、模型共享与自适应模型并行。综合而言,LegoDiffusion显著优于现有扩散工作流服务系统,可承受高达3倍的请求速率并容忍高达8倍的突发流量激增。