Enabling Binary Neural Network Training on the Edge

The ever-growing computational demands of increasingly complex machine learning models frequently necessitate the use of powerful cloud-based infrastructure for their training. Binary neural networks are known to be promising candidates for on-device inference due to their extreme compute and memory savings over higher-precision alternatives. However, their existing training methods require the concurrent storage of high-precision activations for all layers, generally making learning on memory-constrained devices infeasible. In this paper, we demonstrate that the backward propagation operations needed for binary neural network training are strongly robust to quantization, thereby making on-the-edge learning with modern models a practical proposition. We introduce a low-cost binary neural network training strategy exhibiting sizable memory footprint and energy reductions while inducing little to no accuracy loss vs Courbariaux & Bengio's standard approach. These resource decreases are primarily enabled through the retention of activations exclusively in binary format. Against the latter algorithm, our drop-in replacement sees coincident memory requirement and energy consumption drops of 2--6$\times$, while reaching similar test accuracy in comparable time, across a range of small-scale models trained to classify popular datasets. We also demonstrate from-scratch ImageNet training of binarized ResNet-18, achieving a 3.12$\times$ memory reduction. Such savings will allow for unnecessary cloud offloading to be avoided, reducing latency, increasing energy efficiency and safeguarding privacy.

翻译：日益复杂的机器学习模式的日益增长的计算需求往往要求使用强大的云基基础设施进行培训。众所周知,二进制神经网络由于在高精度替代品方面极端的计算和记忆节约,因此是极高精度替代品的极端计算和内存节约,因此有希望的在线电线网络是进行在线推导的可行人选。然而,它们现有的培训方法要求同时储存所有层面的高精度启动功能,通常使对记忆限制装置的学习变得不可行。在本文中,我们表明,二进制神经网络培训所需的后退传播操作非常强大,足以量化,从而使得与现代模型的前沿学习成为实用的实用建议。我们引入了低成本的双进制神经网络培训战略,展示了规模可观的记忆足和节能率,同时提高了可比时间的测试精度和能量的精度,同时使Courbario和Bengio的标准方法几乎没有导致准确性损失。这些资源的减少主要通过完全以二进制格式保留激活装置来实现。在后一种算法下,我们的递增量值中,我们的递增量值的存储要求和耗能下降2-6的耗时值2-6美元,同时在可比时间上达到类似的测试精度,同时提高的精度,同时将达到相当的精度,同时进行我们的节度,同时将使我们从一个小的图像的精度,并进式的模型将使得我们所训练到可以进行这样的图像的精度。