Text-to-image (T2I) generation using multiple conditions enables fine-grained user control on the generated image. Yet, incorporating multi-condition inputs incurs substantial computation and communication overhead, due to additional preprocessing subtasks and control optimizations. It hence leads to unacceptable generation latency. In this paper, we propose an end-edge collaborative system design to accelerate multi-condition T2I generation through adaptive condition offloading and pruning. Extensive offline profiling reveal that, different conditions exhibit significant diversity in computation and communication costs. To this end, we propose a \textit{Subtask Manager} that jointly optimizes condition inference offloading and bandwidth allocation using a heuristic algorithm, balancing local and edge execution delays to minimize overall preprocessing latency. Then, we design a lightweight feature-driven \textit{Conditioning Scale Estimator} that evaluates the contribution of each condition by analyzing its feature activation strength and overlap with other conditions. This allows adaptive conditioning scale selection and pruning of insignificant conditions, thereby accelerating the denoising process. Extensive experimental results show that our system reduces latency by nearly 25\% and improves 6\% average generation quality, outperforming other benchmarks.
翻译:多条件文本到图像(Text-to-Image, T2I)生成能够实现对生成图像的细粒度用户控制。然而,由于额外的预处理子任务和控制优化,多条件输入会带来显著的计算与通信开销,进而导致不可接受的生成延迟。本文提出一种端边协同系统设计,通过自适应条件卸载与剪枝来加速多条件T2I生成。大量离线性能剖析表明,不同条件在计算与通信成本上存在显著差异性。为此,我们提出一个名为"子任务管理器"的模块,采用启发式算法联合优化条件推理卸载与带宽分配,以平衡本地与边缘执行延迟,最小化总预处理耗时。随后,我们设计了一个轻量级、特征驱动的"条件缩放估计器",通过分析各条件的特征激活强度及其与其他条件的重叠程度来评估其贡献。该方法支持自适应条件缩放选择与无效条件的剪枝,从而加速去噪过程。大量实验结果表明,该系统将延迟降低近25%,并将平均生成质量提升6%,性能优于其他基准方案。