Existing neural networks are memory-consuming and computationally intensive, making deploying them challenging in resource-constrained environments. However, there are various methods to improve their efficiency. Two such methods are quantization, a well-known approach for network compression, and re-parametrization, an emerging technique designed to improve model performance. Although both techniques have been studied individually, there has been limited research on their simultaneous application. To address this gap, we propose a novel approach called RepQ, which applies quantization to re-parametrized networks. Our method is based on the insight that the test stage weights of an arbitrary re-parametrized layer can be presented as a differentiable function of trainable parameters. We enable quantization-aware training by applying quantization on top of this function. RepQ generalizes well to various re-parametrized models and outperforms the baseline method LSQ quantization scheme in all experiments.
翻译:现有神经网络存在高内存消耗和计算密集的问题,这使得在资源受限环境中部署此类模型面临挑战。尽管存在多种提升网络效率的方法,但量化作为网络压缩的经典技术,与重参数化这一新兴模型性能优化手段虽有独立研究,其联合应用却鲜有探讨。针对这一空白,本文提出RepQ——一种将量化方法应用于重参数化网络的新型方案。该方法基于关键洞察:任意重参数化层在测试阶段的权重均可表示为可训练参数的连续可微函数。通过在该函数基础上施加量化操作,我们实现了量化感知训练。RepQ方法能够良好泛化至多种重参数化模型,并在所有实验中均优于基准LSQ量化方案。