Binary neural networks (BNNs) are an attractive solution for developing and deploying deep neural network (DNN)-based applications in resource constrained devices. Despite their success, BNNs still suffer from a fixed and limited compression factor that may be explained by the fact that existing pruning methods for full-precision DNNs cannot be directly applied to BNNs. In fact, weight pruning of BNNs leads to performance degradation, which suggests that the standard binarization domain of BNNs is not well adapted for the task. This work proposes a novel more general binary domain that extends the standard binary one that is more robust to pruning techniques, thus guaranteeing improved compression and avoiding severe performance losses. We demonstrate a closed-form solution for quantizing the weights of a full-precision network into the proposed binary domain. Finally, we show the flexibility of our method, which can be combined with other pruning strategies. Experiments over CIFAR-10 and CIFAR-100 demonstrate that the novel approach is able to generate efficient sparse networks with reduced memory usage and run-time latency, while maintaining performance.
翻译:二值神经网络(BNN)是资源受限设备上开发和部署基于深度神经网络(DNN)应用的一种有吸引力的解决方案。尽管取得了成功,但BNN仍面临固定且有限的压缩因子的限制,这可能是由于现有全精度DNN剪枝方法无法直接应用于BNN所致。实际上,对BNN进行权重剪枝会导致性能下降,这表明BNN的标准二值化域并不适合该任务。本文提出了一种新颖的、更通用的二值域,它扩展了标准二值域,对剪枝技术更具鲁棒性,从而保证更高的压缩比并避免严重的性能损失。我们展示了将全精度网络权重量化到所提出的二值域的闭式解。最后,我们证明了该方法的灵活性,可与其他剪枝策略相结合。在CIFAR-10和CIFAR-100上的实验表明,该新方法能够生成高效稀疏网络,在保持性能的同时减少内存使用和运行延迟。