Binarizing by Classification: Is soft function really necessary?

Binary neural networks leverage $\mathrm{Sign}$ function to binarize weights and activations, which require gradient estimators to overcome its non-differentiability and will inevitably bring gradient errors during backpropagation. Although many hand-designed soft functions have been proposed as gradient estimators to better approximate gradients, their mechanism is not clear and there are still huge performance gaps between binary models and their full-precision counterparts. To address these issues and reduce gradient error, we propose to tackle network binarization as a binary classification problem and use a multi-layer perceptron (MLP) as the classifier in the forward pass and gradient estimator in the backward pass. Benefiting from the MLP's theoretical capability to fit any continuous function, it can be adaptively learned to binarize networks and backpropagate gradients without any prior knowledge of soft functions. From this perspective, we further empirically justify that even a simple linear function can outperform previous complex soft functions. Extensive experiments demonstrate that the proposed method yields surprising performance both in image classification and human pose estimation tasks. Specifically, we achieve $65.7\%$ top-1 accuracy of ResNet-34 on ImageNet dataset, with an absolute improvement of $2.6\%$. Moreover, we take binarization as a lightweighting approach for pose estimation models and propose well-designed binary pose estimation networks SBPN and BHRNet. When evaluating on the challenging Microsoft COCO keypoint dataset, the proposed method enables binary networks to achieve a mAP of up to $60.6$ for the first time. Experiments conducted on real platforms demonstrate that BNN achieves a better balance between performance and computational complexity, especially when computational resources are extremely low.

翻译：二值神经网络利用符号函数对权重和激活值进行二值化，这需要梯度估计器来克服其不可微性，并在反向传播过程中不可避免地引入梯度误差。尽管已有许多手工设计的软函数作为梯度估计器来更好地近似梯度，但其机制尚不明确，且二值模型与其全精度对应模型之间仍存在巨大的性能差距。为解决这些问题并减少梯度误差，我们提出将网络二值化视为一个二分类问题，并在前向传播中使用多层感知机作为分类器，同时在反向传播中作为梯度估计器。得益于多层感知机在理论上能够拟合任意连续函数的能力，它可以在无需任何软函数先验知识的情况下自适应地学习二值化网络并反向传播梯度。基于这一视角，我们进一步通过实验证明，即使是一个简单的线性函数也能超越以往复杂的软函数。大量实验表明，所提方法在图像分类和人体姿态估计任务中均取得了惊人的性能。具体而言，我们在ImageNet数据集上使用ResNet-34实现了65.7%的top-1准确率，绝对提升达2.6%。此外，我们将二值化作为姿态估计模型的轻量化手段，并设计了精心构造的二值姿态估计网络SBPN和BHRNet。在具有挑战性的Microsoft COCO关键点数据集上评估时，所提方法首次使二值网络实现了高达60.6的平均精度。在实际平台上的实验表明，BNN在性能与计算复杂度之间取得了更好的平衡，尤其是在计算资源极度受限的情况下。