Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales

Despite the revolutionary breakthroughs of large-scale text-to-image diffusion models for complex vision and downstream tasks, their extremely high computational and storage costs limit their usability. Quantization of diffusion models has been explored in recent works to reduce compute costs and memory bandwidth usage. To further improve inference time, fast convolution algorithms such as Winograd can be used for convolution layers, which account for a significant portion of computations in diffusion models. However, the significant quality loss of fully quantized Winograd using existing coarser-grained post-training quantization methods, combined with the complexity and cost of finetuning the Winograd transformation matrices for such large models to recover quality, makes them unsuitable for large-scale foundation models. Motivated by the presence of a large range of values in them, we investigate the impact of finer-grained group-wise quantization in quantizing diffusion models. While group-wise quantization can largely handle the fully quantized Winograd convolution, it struggles to deal with the large distribution imbalance in a sizable portion of the Winograd domain computation. To reduce range differences in the Winograd domain, we propose finetuning only the scale parameters of the Winograd transform matrices without using any domain-specific training data. Because our method does not depend on any training data, the generalization performance of quantized diffusion models is safely guaranteed. For text-to-image generation task, the 8-bit fully-quantized diffusion model with Winograd provides near-lossless quality (FID and CLIP scores) in comparison to the full-precision model. For image classification, our method outperforms the state-of-the-art Winograd PTQ method by 1.62% and 2.56% in top-1 ImageNet accuracy on ResNet18 and ResNet-34, respectively, with Winograd F(6, 3).

翻译：尽管大规模文生图扩散模型在复杂视觉及下游任务上取得了革命性突破，但其极高的计算与存储成本限制了其实用性。近期研究已探索对扩散模型进行量化以降低计算成本和内存带宽占用。为进一步提升推理速度，可采用Winograd等快速卷积算法处理卷积层——该部分在扩散模型中占据显著计算比重。然而，现有粗粒度训练后量化方法在实现全量化Winograd时会导致严重的质量损失，且为恢复质量而对如此大规模模型的Winograd变换矩阵进行微调又带来复杂性和成本问题，这使得现有方案不适用于大规模基础模型。受模型中存在较大数值范围的启发，我们研究了更细粒度分组量化在扩散模型量化中的影响。虽然分组量化能基本处理全量化Winograd卷积，但仍难以应对Winograd域计算中相当部分存在的巨大分布不平衡问题。为减小Winograd域中的数值范围差异，我们提出仅微调Winograd变换矩阵的尺度参数，且无需使用任何领域特定训练数据。由于本方法不依赖任何训练数据，量化扩散模型的泛化性能得以可靠保证。在文生图任务中，采用Winograd的8位全量化扩散模型与全精度模型相比，能提供近乎无损的质量（FID与CLIP分数）。在图像分类任务中，本方法在ResNet18和ResNet34上使用Winograd F(6, 3)时，其ImageNet top-1准确率分别超越当前最先进的Winograd训练后量化方法1.62%和2.56%。