Fast and Slow Gradient Approximation for Binary Neural Network Optimization

Binary Neural Networks (BNNs) have garnered significant attention due to their immense potential for deployment on edge devices. However, the non-differentiability of the quantization function poses a challenge for the optimization of BNNs, as its derivative cannot be backpropagated. To address this issue, hypernetwork based methods, which utilize neural networks to learn the gradients of non-differentiable quantization functions, have emerged as a promising approach due to their adaptive learning capabilities to reduce estimation errors. However, existing hypernetwork based methods typically rely solely on current gradient information, neglecting the influence of historical gradients. This oversight can lead to accumulated gradient errors when calculating gradient momentum during optimization. To incorporate historical gradient information, we design a Historical Gradient Storage (HGS) module, which models the historical gradient sequence to generate the first-order momentum required for optimization. To further enhance gradient generation in hypernetworks, we propose a Fast and Slow Gradient Generation (FSG) method. Additionally, to produce more precise gradients, we introduce Layer Recognition Embeddings (LRE) into the hypernetwork, facilitating the generation of layer-specific fine gradients. Extensive comparative experiments on the CIFAR-10 and CIFAR-100 datasets demonstrate that our method achieves faster convergence and lower loss values, outperforming existing baselines.Code is available at http://github.com/two-tiger/FSG .

翻译：二值神经网络（BNNs）因其在边缘设备上部署的巨大潜力而受到广泛关注。然而，量化函数的不可微性给BNNs的优化带来了挑战，因为其导数无法通过反向传播进行传递。为解决这一问题，基于超网络的方法应运而生，该方法利用神经网络学习不可微量化函数的梯度，凭借其自适应学习能力来减少估计误差，成为一种前景广阔的研究方向。然而，现有的基于超网络的方法通常仅依赖当前梯度信息，忽略了历史梯度的影响。这一疏忽可能导致在优化过程中计算梯度动量时产生累积的梯度误差。为融入历史梯度信息，我们设计了一个历史梯度存储（HGS）模块，该模块对历史梯度序列进行建模，以生成优化所需的一阶动量。为进一步增强超网络中的梯度生成能力，我们提出了一种快速与慢速梯度生成（FSG）方法。此外，为生成更精确的梯度，我们在超网络中引入了层识别嵌入（LRE），以促进生成针对特定层的精细梯度。在CIFAR-10和CIFAR-100数据集上进行的大量对比实验表明，我们的方法实现了更快的收敛速度和更低的损失值，性能优于现有基线。代码可在http://github.com/two-tiger/FSG获取。

相关内容

Neural Networks

关注 1654

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日