Diffusion model deployment has been suffering from high energy consumption and inference latency despite its superior performance in visual generation tasks. Dynamic voltage and frequency scaling (DVFS) offers a promising solution to exploit the potential of the underlying accelerators. However, existing approaches often lead to either limited efficiency gains or degraded output quality because they overlook the inherent fault tolerance of the diffusion model. Therefore, in this paper, we propose DRIFT, a novel algorithmarchitecture co-optimization framework that harnesses the fault tolerance for efficient and reliable diffusion model inference. We first perform a comprehensive resilience analysis on representative diffusion models. Building on these observations, we introduce a fine-grained, resilience-aware DVFS strategy that selectively protects error-sensitive network blocks and timesteps, and a rollback algorithm-based fault tolerance (ABFT) mechanism that adaptively corrects only critical errors by reverting to previous timesteps. We further optimize offloading intervals and reorganize data layouts to reduce memory overhead. Experiments across diverse models and datasets show that DRIFT can achieve on average 36% energy savings through voltage underscaling or 1.7x speedup via overclocking while maintaining generation quality.
翻译:扩散模型在视觉生成任务中展现出卓越性能,但其部署长期面临高能耗和推理延迟的挑战。动态电压频率缩放(DVFS)为挖掘底层加速器潜力提供了可行方案,然而现有方法常因忽视扩散模型固有的容错特性,导致能效提升有限或输出质量下降。为此,本文提出DRIFT——一种新型算法-架构协同优化框架,通过利用容错性实现高效可靠的扩散模型推理。我们首先对代表性扩散模型展开全面的韧性分析,在此基础上引入细粒度、感知韧性的DVFS策略,选择性保护对误差敏感的网络块与时间步,并设计基于回滚算法的容错机制(ABFT),通过回溯至先前时间步来自适应纠正关键误差。进一步优化卸载间隔与数据布局重组以降低内存开销。跨多种模型与数据集的实验表明,DRIFT在保持生成质量的同时,通过欠压调节平均可实现36%的能耗节省,或通过超频达成1.7倍的加速比。