EIE proposed to accelerate pruned and compressed neural networks, exploiting weight sparsity, activation sparsity, and 4-bit weight-sharing in neural network accelerators. Since published in ISCA'16, it opened a new design space to accelerate pruned and sparse neural networks and spawned many algorithm-hardware co-designs for model compression and acceleration, both in academia and commercial AI chips. In retrospect, we review the background of this project, summarize the pros and cons, and discuss new opportunities where pruning, sparsity, and low precision can accelerate emerging deep learning workloads.
翻译:EIE提出了一种加速剪枝与压缩神经网络的方法,通过利用权重稀疏性、激活稀疏性及神经网络加速器中的4位权重共享。自发表于ISCA'16以来,该方法开辟了加速剪枝与稀疏神经网络的新设计空间,并在学术界和商业AI芯片中催生了大量模型压缩与加速的算法-硬件协同设计。在回顾中,我们梳理了该项目的背景,总结了其优势与不足,并探讨了剪枝、稀疏性与低精度如何加速新兴深度学习工作负载的新机遇。