In this short note, we propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization while preserving the network performance on given training data. On one hand, the computational complexity of this pre-processing slightly exceeds that of state-of-the-art algorithms in the literature. On the other hand, our approach does not require any hyper-parameter tuning and, in contrast to previous methods, allows a plain analysis. We provide rigorous theoretical guarantees in the case of quantizing single network layers and show that the relative error decays with the number of parameters in the network if the training data behaves well, e.g., if it is sampled from suitable random distributions. The developed method also readily allows the quantization of deep networks by consecutive application to single layers.
翻译:在本文中,我们提出了一种对已完全训练好的神经网络进行权重量化的新方法。通过一个简单的确定性预处理步骤,我们能够在保持网络在给定训练数据上性能的同时,利用无记忆标量量化实现对网络层的量化。一方面,该预处理的计算复杂度略高于现有最先进算法;另一方面,我们的方法无需任何超参数调优,且与以往方法不同,可进行简洁分析。针对单网络层量化情形,我们提供了严格的理论保证,并证明在训练数据表现良好时(例如从适当的随机分布中采样),相对误差会随网络参数数量的增加而衰减。所提出的方法还可通过逐层连续应用,直接实现深度网络的量化。