Deep Neural Networks reached state-of-the-art performance across numerous domains, but this progress has come at the cost of increasingly large and over-parameterized models, posing serious challenges for deployment on resource-constrained devices. As a result, model compression has become essential, and -- among compression techniques -- weight quantization is largely used and particularly effective, yet it typically introduces a non-negligible accuracy drop. However, it is usually applied to already trained models, without influencing how the parameter space is explored during the learning phase. In contrast, we introduce per-layer regularization terms that drive weights to naturally form clusters during training, integrating quantization awareness directly into the optimization process. This reduces the accuracy loss typically associated with quantization methods while preserving their compression potential. Furthermore, in our framework quantization representatives become network parameters, marking, to the best of our knowledge, the first approach to embed quantization parameters directly into the backpropagation procedure. Experiments on CIFAR-10 with AlexNet and VGG16 models confirm the effectiveness of the proposed strategy.
翻译:深度神经网络已在众多领域达到最先进的性能,但这一进展伴随着模型规模日益增大和参数过度冗余的代价,为资源受限设备上的部署带来了严峻挑战。因此,模型压缩变得至关重要。在各类压缩技术中,权重量化被广泛使用且尤为有效,但通常会导致不可忽视的精度下降。然而,量化通常应用于已训练完成的模型,并未影响学习阶段对参数空间的探索方式。相比之下,我们引入了逐层正则化项,在训练过程中驱动权重自然形成聚类,从而将量化感知直接整合到优化过程中。这减少了量化方法通常伴随的精度损失,同时保持了其压缩潜力。此外,在我们的框架中,量化代表值成为网络参数——据我们所知,这是首次将量化参数直接嵌入反向传播过程的方法。在CIFAR-10数据集上使用AlexNet和VGG16模型进行的实验验证了所提策略的有效性。