Memory-augmented spiking neural networks (SNNs) promise energy-efficient neuromorphic computing, yet their generalization across sensory modalities remains unexplored. We present the first comprehensive cross-modal ablation study of memory mechanisms in SNNs, evaluating Hopfield networks, Hierarchical Gated Recurrent Networks (HGRNs), and supervised contrastive learning (SCL) across visual (N-MNIST) and auditory (SHD) neuromorphic datasets. Our systematic evaluation of five architectures reveals striking modality-dependent performance patterns: Hopfield networks achieve 97.68% accuracy on visual tasks but only 76.15% on auditory tasks (21.53 point gap), revealing severe modality-specific specialization, while SCL demonstrates more balanced cross-modal performance (96.72% visual, 82.16% audio, 14.56 point gap). These findings establish that memory mechanisms exhibit task-specific benefits rather than universal applicability. Joint multi-modal training with HGRN achieves 94.41% visual and 79.37% audio accuracy (88.78% average), matching parallel HGRN performance through unified deployment. Quantitative engram analysis confirms weak cross-modal alignment (0.038 similarity), validating our parallel architecture design. Our work provides the first empirical evidence for modality-specific memory optimization in neuromorphic systems, achieving 603x energy efficiency over traditional neural networks.
翻译:记忆增强型脉冲神经网络有望实现高能效的神经形态计算,但其在不同感知模态间的泛化能力尚未得到探索。我们首次对SNN中的记忆机制进行了全面的跨模态消融研究,在视觉(N-MNIST)和听觉(SHD)神经形态数据集上评估了Hopfield网络、分层门控循环网络和监督对比学习。我们对五种架构的系统性评估揭示了显著的模态依赖性能模式:Hopfield网络在视觉任务上达到97.68%的准确率,但在听觉任务上仅为76.15%(相差21.53个百分点),显示出严重的模态特异性;而SCL则表现出更均衡的跨模态性能(视觉96.72%,听觉82.16%,相差14.56个百分点)。这些发现证实记忆机制具有任务特异性优势而非普适性。采用HGRN进行联合多模态训练可获得94.41%的视觉准确率和79.37%的听觉准确率(平均88.78%),通过统一部署达到了并行HGRN架构的性能水平。定量记忆印迹分析证实了跨模态对齐程度较弱(相似度0.038),验证了并行架构设计的合理性。我们的研究首次为神经形态系统中模态特异性记忆优化提供了实证依据,其能效比传统神经网络提升603倍。