Neural networks achieve state-of-the-art performance in image classification, speech recognition, scientific analysis and many more application areas. Due to the high computational complexity and memory footprint of neural networks, various compression techniques, such as pruning and quantization, have been proposed in literature. Pruning sparsifies a neural network, reducing the number of multiplications and memory. However, pruning often fails to capture properties of the underlying hardware, causing unstructured sparsity and load-balance inefficiency, thus bottlenecking resource improvements. We propose a hardware-centric formulation of pruning, by formulating it as a knapsack problem with resource-aware tensor structures. Evaluated on a range of tasks, including sub-microsecond particle classification at CERN's Large Hadron Collider and fast image classification, the proposed method achieves reductions ranging between 55% and 92% in the DSP utilization and up to 81% in BRAM utilization.
翻译:神经网络在图像分类、语音识别、科学分析及众多应用领域均展现出最先进的性能。由于神经网络计算复杂度高、内存占用大,文献中已提出多种压缩技术,如剪枝和量化。剪枝通过稀疏化神经网络来减少乘法运算次数和内存消耗。然而,剪枝往往未能捕捉底层硬件的特性,导致非结构化稀疏性和负载均衡效率低下,从而制约了资源的进一步优化。我们提出了一种以硬件为中心的剪枝方法,将其形式化为一个具有资源感知张量结构的背包问题。在包括CERN大型强子对撞机亚微秒级粒子分类和快速图像分类等一系列任务上的评估表明,该方法实现了DSP利用率降低55%至92%,BRAM利用率降低最高达81%。