Mixed-precision quantization mostly predetermines the model bit-width settings before actual training due to the non-differential bit-width sampling process, obtaining sub-optimal performance. Worse still, the conventional static quality-consistent training setting, i.e., all data is assumed to be of the same quality across training and inference, overlooks data quality changes in real-world applications which may lead to poor robustness of the quantized models. In this paper, we propose a novel Data Quality-aware Mixed-precision Quantization framework, dubbed DQMQ, to dynamically adapt quantization bit-widths to different data qualities. The adaption is based on a bit-width decision policy that can be learned jointly with the quantization training. Concretely, DQMQ is modeled as a hybrid reinforcement learning (RL) task that combines model-based policy optimization with supervised quantization training. By relaxing the discrete bit-width sampling to a continuous probability distribution that is encoded with few learnable parameters, DQMQ is differentiable and can be directly optimized end-to-end with a hybrid optimization target considering both task performance and quantization benefits. Trained on mixed-quality image datasets, DQMQ can implicitly select the most proper bit-width for each layer when facing uneven input qualities. Extensive experiments on various benchmark datasets and networks demonstrate the superiority of DQMQ against existing fixed/mixed-precision quantization methods.
翻译:混合精度量化通常在实际训练前预先确定模型的位宽设置,这是由于位宽采样过程的不可微性,导致性能次优。更糟糕的是,传统静态质量一致的训练设置(即假设训练和推理过程中所有数据具有相同质量)忽视了实际应用中数据质量的变化,可能导致量化模型的鲁棒性较差。本文提出了一种新颖的数据质量感知混合精度量化框架DQMQ,旨在根据不同数据质量动态调整量化位宽。这种调整基于一个可与量化训练联合学习的位宽决策策略。具体而言,DQMQ被建模为一个混合强化学习任务,将基于模型的策略优化与监督式量化训练相结合。通过将离散的位宽采样松弛为具有少量可学习参数的连续概率分布,DQMQ变得可微,并可直接以端到端方式优化,其混合优化目标同时考虑了任务性能和量化收益。在混合质量图像数据集上训练后,DQMQ在面对不均匀输入质量时能够隐式地为每一层选择最合适的位宽。在多个基准数据集和网络上的大量实验表明,DQMQ相较于现有固定/混合精度量化方法具有优越性。