Diffusion Transformers (DiTs) have emerged as the state-of-the-art backbone for high-fidelity image and video generation. However, their massive computational cost and memory footprint hinder deployment on edge devices. While post-training quantization (PTQ) has proven effective for large language models (LLMs), directly applying existing methods to DiTs yields suboptimal results due to the neglect of the unique temporal dynamics inherent in diffusion processes. In this paper, we propose AdaTSQ, a novel PTQ framework that pushes the Pareto frontier of efficiency and quality by exploiting the temporal sensitivity of DiTs. First, we propose a Pareto-aware timestep-dynamic bit-width allocation strategy. We model the quantization policy search as a constrained pathfinding problem. We utilize a beam search algorithm guided by end-to-end reconstruction error to dynamically assign layer-wise bit-widths across different timesteps. Second, we propose a Fisher-guided temporal calibration mechanism. It leverages temporal Fisher information to prioritize calibration data from highly sensitive timesteps, seamlessly integrating with Hessian-based weight optimization. Extensive experiments on four advanced DiTs (e.g., Flux-Dev, Flux-Schnell, Z-Image, and Wan2.1) demonstrate that AdaTSQ significantly outperforms state-of-the-art methods like SVDQuant and ViDiT-Q. Our code will be released at https://github.com/Qiushao-E/AdaTSQ.
翻译:扩散Transformer(DiTs)已成为高保真图像与视频生成的最先进骨干网络。然而,其巨大的计算成本与内存占用阻碍了在边缘设备上的部署。尽管训练后量化(PTQ)已被证明对大型语言模型(LLMs)有效,但直接将现有方法应用于DiTs会因忽视扩散过程固有的独特时序动态而导致次优结果。本文提出AdaTSQ,一种新颖的PTQ框架,通过利用DiTs的时序敏感性来推动效率与质量的帕累托前沿。首先,我们提出一种帕累托感知的时步动态比特宽度分配策略。我们将量化策略搜索建模为一个约束路径寻优问题,利用由端到端重构误差引导的束搜索算法,动态分配不同时步下各层的比特宽度。其次,我们提出一种费希尔引导的时序校准机制。该机制利用时序费希尔信息,优先从高敏感时步中选取校准数据,并与基于Hessian的权重优化无缝集成。在四种先进DiT模型(如Flux-Dev、Flux-Schnell、Z-Image和Wan2.1)上的大量实验表明,AdaTSQ显著优于SVDQuant和ViDiT-Q等最先进方法。我们的代码将在https://github.com/Qiushao-E/AdaTSQ发布。