Deep learning has greatly advanced automatic speech recognition (ASR), enabling widespread deployment on edge devices such as smartphones and smart home systems. However, the computational and energy demands of deep neural networks pose significant challenges for such resource-constrained deployments, introducing latency and limiting real-time interaction. Neuromorphic computing offers a promising solution by introducing activation sparsity through spiking neural networks (SNNs) and event-driven neural networks, converting dense operations into sparse computations. However, a study that evaluates the hardware benefits of different neuromorphic strategies remains lacking for ASR. This paper explores spiking and event-driven neuromorphic neural networks to improve activation sparsity in the state-of-the-art SpeechMamba model for ASR. We introduce an event-driven SpeechMamba with FATReLU activation, achieving over 60% activation sparsity with less than 1% accuracy degradation on LibriSpeech. We also propose a spiking SpeechMamba that attains over 70% sparsity while using 30% fewer parameters than comparable SNNs. Finally, we develop a cycle-accurate event-driven simulator enabling flexible algorithm-hardware co-exploration, which helps us identify computational bottlenecks and yields over 10% additional efficiency improvements.
翻译:深度学习极大地推动了自动语音识别(ASR)的发展,使其能够广泛应用于智能手机和智能家居系统等边缘设备。然而,深度神经网络的计算和能源需求对资源受限的部署场景提出了重大挑战,导致延迟增加并限制了实时交互。神经形态计算通过脉冲神经网络(SNNs)和事件驱动神经网络引入激活稀疏性,将密集计算转化为稀疏计算,提供了一种有前景的解决方案。然而,目前仍缺乏针对不同神经形态策略在ASR中硬件优势的评估研究。本文探索了脉冲与事件驱动的神经形态神经网络,以提升最先进的SpeechMamba模型在ASR中的激活稀疏性。我们引入了采用FATReLU激活函数的事件驱动SpeechMamba模型,在LibriSpeech数据集上实现了超过60%的激活稀疏性,且准确率下降不足1%。同时,我们提出了脉冲SpeechMamba模型,其稀疏性超过70%,且参数比同等SNNs减少30%。最后,我们开发了一个周期精确的事件驱动仿真器,支持灵活的算法-硬件协同探索,帮助识别计算瓶颈并实现超过10%的额外效率提升。