In neural network binarization, BinaryConnect (BC) and its variants are considered the standard. These methods apply the sign function in their forward pass and their respective gradients are backpropagated to update the weights. However, the derivative of the sign function is zero whenever defined, which consequently freezes training. Therefore, implementations of BC (e.g., BNN) usually replace the derivative of sign in the backward computation with identity or other approximate gradient alternatives. Although such practice works well empirically, it is largely a heuristic or ''training trick.'' We aim at shedding some light on these training tricks from the optimization perspective. Building from existing theory on ProxConnect (PC, a generalization of BC), we (1) equip PC with different forward-backward quantizers and obtain ProxConnect++ (PC++) that includes existing binarization techniques as special cases; (2) derive a principled way to synthesize forward-backward quantizers with automatic theoretical guarantees; (3) illustrate our theory by proposing an enhanced binarization algorithm BNN++; (4) conduct image classification experiments on CNNs and vision transformers, and empirically verify that BNN++ generally achieves competitive results on binarizing these models.
翻译:在神经网络二值化中,BinaryConnect(BC)及其变体被视为标准方法。这些方法在前向传播中应用符号函数,并通过反向传播其各自梯度来更新权重。然而,符号函数的导数在定义处恒为零,从而导致训练停滞。因此,BC的实现(如BNN)通常在后向计算中用恒等函数或其他近似梯度替代符号函数的导数。尽管这类实践在经验上表现良好,但很大程度上属于启发式方法或"训练技巧"。我们旨在从优化角度阐明这些训练技巧。基于现有的ProxConnect(PC,BC的泛化形式)理论,我们:(1)为PC配备不同的前向后向量化器,得到ProxConnect++(PC++),其将现有二值化技术作为特例纳入;(2)推导出一种具有自动理论保证的、合成前向后向量化器的原则性方法;(3)通过提出增强型二值化算法BNN++来阐述我们的理论;(4)在CNN和视觉Transformer上进行图像分类实验,并实证验证BNN++在二值化这些模型时通常能达到有竞争力的结果。