循环神经网络中的几何稀疏化 (Geometric sparsification in recurrent neural networks)

A common technique for ameliorating the computational costs of running large neural models is sparsification, or the pruning of neural connections during training. Sparse models are capable of maintaining the high accuracy of state of the art models, while functioning at the cost of more parsimonious models. The structures which underlie sparse architectures are, however, poorly understood and not consistent between differently trained models and sparsification schemes. In this paper, we propose a new technique for sparsification of recurrent neural nets (RNNs), called moduli regularization, in combination with magnitude pruning. Moduli regularization leverages the dynamical system induced by the recurrent structure to induce a geometric relationship between neurons in the hidden state of the RNN. By making our regularizing term explicitly geometric, we provide the first, to our knowledge, a priori description of the desired sparse architecture of our neural net, as well as explicit end-to-end learning of RNN geometry. We verify the effectiveness of our scheme under diverse conditions, testing in navigation, natural language processing, and addition RNNs. Navigation is a structurally geometric task, for which there are known moduli spaces, and we show that regularization can be used to reach 90% sparsity while maintaining model performance only when coefficients are chosen in accordance with a suitable moduli space. Natural language processing and addition, however, have no known moduli space in which computations are performed. Nevertheless, we show that moduli regularization induces more stable recurrent neural nets, and achieves high fidelity models above 90% sparsity.

翻译：缓解大型神经网络模型计算开销的一种常用技术是稀疏化，即在训练过程中剪除神经连接。稀疏模型能够保持最先进模型的高精度，同时以更经济的模型成本运行。然而，稀疏架构背后的结构尚不明确，且在不同训练模型和稀疏化方案之间缺乏一致性。本文提出一种结合幅度剪枝的循环神经网络（RNNs）稀疏化新技术，称为模正则化。该技术利用循环结构诱导的动力学系统，在RNN隐藏状态中建立神经元间的几何关系。通过使正则化项具有明确的几何特性，我们首次（据我们所知）实现了对神经网络期望稀疏架构的先验描述，以及RNN几何结构的端到端显式学习。我们在多种条件下验证了该方案的有效性，包括导航、自然语言处理和加法RNN等任务。导航任务具有结构性几何特征，且存在已知模空间；我们证明只有当系数选取符合适当模空间时，正则化才能在保持模型性能的同时达到90%的稀疏度。自然语言处理和加法任务虽无已知的计算模空间，但模正则化仍能诱导出更稳定的循环神经网络，并在超过90%的稀疏度下实现高保真度模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

分布外泛化(Out-Of-Distribution Generalization) 综述论文，22页pdf240篇文献

专知会员服务

64+阅读 · 2021年9月2日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日