This work focuses on reducing neural network size, which is a major driver of neural network execution time, power consumption, bandwidth, and memory footprint. A key challenge is to reduce size in a manner that can be exploit-ed readily for efficient training and inference without the need for specialized hardware. We propose Self-Compression: a simple, general method that simultaneously achieves two goals: (1) removing redundant weights, and (2) reducing the number of bits required to represent the remaining weights. This is achieved using a generalized loss function to minimize overall network size. In our ex-periments we demonstrate floating point accuracy with as few as 3% of the bits and 18% of the weights remaining in the network.
翻译:本研究聚焦于缩小神经网络规模这一关键问题,因为网络规模直接影响执行时间、功耗、带宽和内存占用。核心挑战在于如何实现可便捷用于高效训练与推理的压缩方法,且无需依赖专用硬件。我们提出自压缩方法:一种简单通用的技术,能够同时达成两个目标:(1)移除冗余权重,以及(2)减少剩余权重的位数表示。通过设计广义损失函数来最小化整体网络规模来实现这一目标。实验表明,在仅保留网络中3%的位宽和18%的权重时,我们仍能维持浮点精度的性能表现。