Binary Neural Networks (BNNs) have garnered significant attention due to their immense potential for deployment on edge devices. However, the non-differentiability of the quantization function poses a challenge for the optimization of BNNs, as its derivative cannot be backpropagated. To address this issue, hypernetwork based methods, which utilize neural networks to learn the gradients of non-differentiable quantization functions, have emerged as a promising approach due to their adaptive learning capabilities to reduce estimation errors. However, existing hypernetwork based methods typically rely solely on current gradient information, neglecting the influence of historical gradients. This oversight can lead to accumulated gradient errors when calculating gradient momentum during optimization. To incorporate historical gradient information, we design a Historical Gradient Storage (HGS) module, which models the historical gradient sequence to generate the first-order momentum required for optimization. To further enhance gradient generation in hypernetworks, we propose a Fast and Slow Gradient Generation (FSG) method. Additionally, to produce more precise gradients, we introduce Layer Recognition Embeddings (LRE) into the hypernetwork, facilitating the generation of layer-specific fine gradients. Extensive comparative experiments on the CIFAR-10 and CIFAR-100 datasets demonstrate that our method achieves faster convergence and lower loss values, outperforming existing baselines.Code is available at http://github.com/two-tiger/FSG .
翻译:二值神经网络(BNNs)因其在边缘设备上部署的巨大潜力而受到广泛关注。然而,量化函数的不可微性给BNNs的优化带来了挑战,因为其导数无法通过反向传播进行传递。为解决这一问题,基于超网络的方法应运而生,该方法利用神经网络学习不可微量化函数的梯度,凭借其自适应学习能力来减少估计误差,成为一种前景广阔的研究方向。然而,现有的基于超网络的方法通常仅依赖当前梯度信息,忽略了历史梯度的影响。这一疏忽可能导致在优化过程中计算梯度动量时产生累积的梯度误差。为融入历史梯度信息,我们设计了一个历史梯度存储(HGS)模块,该模块对历史梯度序列进行建模,以生成优化所需的一阶动量。为进一步增强超网络中的梯度生成能力,我们提出了一种快速与慢速梯度生成(FSG)方法。此外,为生成更精确的梯度,我们在超网络中引入了层识别嵌入(LRE),以促进生成针对特定层的精细梯度。在CIFAR-10和CIFAR-100数据集上进行的大量对比实验表明,我们的方法实现了更快的收敛速度和更低的损失值,性能优于现有基线。代码可在http://github.com/two-tiger/FSG获取。