Efficient inference of Deep Neural Networks (DNNs) is essential to making AI ubiquitous. Two important algorithmic techniques have shown promise for enabling efficient inference - sparsity and binarization. These techniques translate into weight sparsity and weight repetition at the hardware-software level enabling the deployment of DNNs with critically low power and latency requirements. We propose a new method called signed-binary networks to improve efficiency further (by exploiting both weight sparsity and weight repetition together) while maintaining similar accuracy. Our method achieves comparable accuracy on ImageNet and CIFAR10 datasets with binary and can lead to 69% sparsity. We observe real speedup when deploying these models on general-purpose devices and show that this high percentage of unstructured sparsity can lead to a further reduction in energy consumption on ASICs.
翻译:深度神经网络的高效推理对于实现人工智能的广泛应用至关重要。两种重要的算法技术——稀疏化和二值化——已展现出实现高效推理的潜力。这些技术在软硬件层面转化为权重稀疏性与权重重复性,使得部署具有极低功耗和延迟要求的深度神经网络成为可能。我们提出一种名为"带符号二进制网络"的新方法,通过同时利用权重稀疏性与权重重复性进一步提高效率,同时保持相近的准确率。我们的方法在ImageNet和CIFAR10数据集上实现了与二值网络相当的精度,并可达到69%的稀疏度。我们在通用设备上部署这些模型时观察到了实际加速效果,并表明这种高比例的非结构化稀疏性能在专用集成电路(ASIC)上进一步降低能耗。