This work focuses on reducing neural network size, which is a major driver of neural network execution time, power consumption, bandwidth, and memory footprint. A key challenge is to reduce size in a manner that can be exploited readily for efficient training and inference without the need for specialized hardware. We propose Self-Compression: a simple, general method that simultaneously achieves two goals: (1) removing redundant weights, and (2) reducing the number of bits required to represent the remaining weights. This is achieved using a generalized loss function to minimize overall network size. In our experiments we demonstrate floating point accuracy with as few as 3% of the bits and 18% of the weights remaining in the network.
翻译:本工作聚焦于减小神经网络规模,这是影响神经网络执行时间、功耗、带宽和内存占用的主要因素。关键挑战在于以无需专用硬件即可直接用于高效训练和推理的方式缩小规模。我们提出自压缩(Self-Compression)方法:一种能够同时实现两个目标的简单通用方法——(1) 移除冗余权重,(2) 减少表示剩余权重所需的比特数。该方法通过广义损失函数实现网络整体规模的最小化。实验表明,当网络中仅保留3%的比特位和18%的权重时,仍能保持浮点运算精度。