Activity and parameter sparsity are two standard methods of making neural networks computationally more efficient. Event-based architectures such as spiking neural networks (SNNs) naturally exhibit activity sparsity, and many methods exist to sparsify their connectivity by pruning weights. While the effect of weight pruning on feed-forward SNNs has been previously studied for computer vision tasks, the effects of pruning for complex sequence tasks like language modeling are less well studied since SNNs have traditionally struggled to achieve meaningful performance on these tasks. Using a recently published SNN-like architecture that works well on small-scale language modeling, we study the effects of weight pruning when combined with activity sparsity. Specifically, we study the trade-off between the multiplicative efficiency gains the combination affords and its effect on task performance for language modeling. To dissect the effects of the two sparsities, we conduct a comparative analysis between densely activated models and sparsely activated event-based models across varying degrees of connectivity sparsity. We demonstrate that sparse activity and sparse connectivity complement each other without a proportional drop in task performance for an event-based neural network trained on the Penn Treebank and WikiText-2 language modeling datasets. Our results suggest sparsely connected event-based neural networks are promising candidates for effective and efficient sequence modeling.
翻译:活动稀疏性与参数稀疏性是提升神经网络计算效率的两种标准方法。脉冲神经网络(SNNs)等基于事件的架构天然具备活动稀疏性,同时存在多种通过剪枝权重实现连接稀疏化的技术。尽管前馈SNN中权重剪枝对计算机视觉任务的影响已有研究,但由于传统SNN在语言建模等复杂序列任务中难以取得有意义的性能表现,剪枝对此类任务的影响研究相对较少。本研究采用近期提出的适用于小规模语言建模的类SNN架构,探讨权重剪枝与活动稀疏性结合时的效应。具体而言,我们分析这种组合带来的乘法效率增益与其对语言建模任务性能影响之间的权衡关系。为解耦两类稀疏性的作用机制,我们在不同连接稀疏度条件下,对密集激活模型与稀疏激活的基于事件模型展开对比分析。实验表明,在Penn Treebank和WikiText-2语言建模数据集上训练的基于事件神经网络中,稀疏活动与稀疏连接能够相互补充,且不会导致任务性能成比例下降。我们的研究结果表明,稀疏连接的基于事件神经网络是构建高效序列建模系统的有力候选方案。