Diffusion Transformers (DiTs) have recently gained substantial attention in both industrial and academic fields for their superior visual generation capabilities, outperforming traditional diffusion models that use U-Net. However,the enhanced performance of DiTs also comes with high parameter counts and implementation costs, seriously restricting their use on resource-limited devices such as mobile phones. To address these challenges, we introduce the Hybrid Floating-point Quantization for DiT(HQ-DiT), an efficient post-training quantization method that utilizes 4-bit floating-point (FP) precision on both weights and activations for DiT inference. Compared to fixed-point quantization (e.g., INT8), FP quantization, complemented by our proposed clipping range selection mechanism, naturally aligns with the data distribution within DiT, resulting in a minimal quantization error. Furthermore, HQ-DiT also implements a universal identity mathematical transform to mitigate the serious quantization error caused by the outliers. The experimental results demonstrate that DiT can achieve extremely low-precision quantization (i.e., 4 bits) with negligible impact on performance. Our approach marks the first instance where both weights and activations in DiTs are quantized to just 4 bits, with only a 0.12 increase in sFID on ImageNet.
翻译:扩散Transformer(DiTs)凭借其卓越的视觉生成能力,近期在工业界和学术界获得了广泛关注,其性能超越了使用U-Net的传统扩散模型。然而,DiTs性能提升的同时也伴随着高参数量和高实现成本,严重限制了其在手机等资源受限设备上的应用。为应对这些挑战,我们提出了用于DiT的混合浮点量化方法(HQ-DiT),这是一种高效的训练后量化方法,在DiT推理中对权重和激活均采用4位浮点(FP)精度。与定点量化(如INT8)相比,浮点量化辅以我们提出的截断范围选择机制,能自然地与DiT内部的数据分布对齐,从而实现极小的量化误差。此外,HQ-DiT还实现了一种通用的恒等数学变换,以减轻由异常值引起的严重量化误差。实验结果表明,DiT能够实现极低精度(即4位)的量化,且对性能的影响微乎其微。我们的方法首次实现了将DiT中的权重和激活同时量化至仅4位,在ImageNet数据集上sFID仅增加0.12。