A common technique for ameliorating the computational costs of running large neural models is sparsification, or the removal of neural connections during training. Sparse models are capable of maintaining the high accuracy of state of the art models, while functioning at the cost of more parsimonious models. The structures which underlie sparse architectures are, however, poorly understood and not consistent between differently trained models and sparsification schemes. In this paper, we propose a new technique for sparsification of recurrent neural nets (RNNs), called moduli regularization, in combination with magnitude pruning. Moduli regularization leverages the dynamical system induced by the recurrent structure to induce a geometric relationship between neurons in the hidden state of the RNN. By making our regularizing term explicitly geometric, we provide the first, to our knowledge, a priori description of the desired sparse architecture of our neural net. We verify the effectiveness of our scheme for navigation and natural language processing RNNs. Navigation is a structurally geometric task, for which there are known moduli spaces, and we show that regularization can be used to reach 90% sparsity while maintaining model performance only when coefficients are chosen in accordance with a suitable moduli space. Natural language processing, however, has no known moduli space in which computations are performed. Nevertheless, we show that moduli regularization induces more stable recurrent neural nets with a variety of moduli regularizers, and achieves high fidelity models at 98% sparsity.
翻译:降低大型神经模型计算成本的常用技术是稀疏化,即在训练过程中移除神经连接。稀疏模型能够保持先进模型的高精度,同时以更简约模型的成本运行。然而,稀疏架构的底层结构尚不明确,且在经过不同训练的模型与稀疏化方案之间缺乏一致性。本文提出一种名为模量正则化的递归神经网络稀疏化新技术,并将其与幅度剪枝相结合。模量正则化利用递归结构所诱导的动态系统,在递归神经网络隐藏状态的神经元之间建立几何关系。通过使正则化项显式具备几何特性,我们首次(据我们所知)对神经网络的期望稀疏架构进行了先验描述。我们验证了该方案在导航与自然语言处理递归神经网络中的有效性。导航是一种结构几何任务,存在已知的模量空间,研究表明仅当根据合适的模量空间选择系数时,正则化才能实现90%稀疏度同时保持模型性能。而自然语言处理领域尚无已知的模量空间来执行计算。尽管如此,研究表明模量正则化可通过多种模量正则化器生成更稳定的递归神经网络,并在98%稀疏度下实现高保真模型。