时序正则化训练：释放脉冲神经网络的潜力 (Temporal Regularization Training: Unleashing the Potential of Spiking Neural Networks)

Spiking Neural Networks (SNNs) have received widespread attention due to their event-driven and low-power characteristics, making them particularly effective for processing neuromorphic data. Recent studies have shown that directly trained SNNs suffer from severe temporal gradient vanishing and overfitting issues, which fundamentally constrain their performance and generalizability. This paper unveils a temporal regularization training (TRT) memthod, designed to unleash the generalization and performance potential of SNNs through a time-decaying regularization mechanism that prioritizes early timesteps with stronger constraints. We perform theoretical analysis to reveal TRT's ability on mitigating the temporal gradient vanishment. To validate the effectiveness of TRT, we conduct experiments on both static image datasets and dynamic neuromorphic datasets, perform analysis of their results, demonstrating that TRT can effectively mitigate overfitting and help SNNs converge into flatter local minima with better generalizability. Furthermore, we establish a theoretical interpretation of TRT's temporal regularization mechanism by analyzing the temporal information dynamics inside SNNs. We track the Fisher information of SNNs during training process, showing that Fisher information progressively concentrates in early timesteps. The time-decaying regularization mechanism implemented in TRT effectively guides the network to learn robust features in early timesteps with rich information, thereby leading to significant improvements in model generalization.

翻译：脉冲神经网络（SNNs）因其事件驱动和低功耗的特性受到广泛关注，使其在处理神经形态数据方面尤为有效。近期研究表明，直接训练的SNNs存在严重的时间梯度消失和过拟合问题，这从根本上限制了其性能和泛化能力。本文提出了一种时序正则化训练（TRT）方法，旨在通过一种时间衰减的正则化机制来释放SNNs的泛化与性能潜力，该机制优先对信息丰富的早期时间步施加更强的约束。我们通过理论分析揭示了TRT缓解时间梯度消失的能力。为验证TRT的有效性，我们在静态图像数据集和动态神经形态数据集上进行了实验，并对结果进行了分析，证明TRT能有效缓解过拟合，并帮助SNNs收敛到更平坦的局部极小值，从而获得更好的泛化性能。此外，通过分析SNNs内部的时间信息动态，我们建立了对TRT时序正则化机制的理论解释。我们追踪了训练过程中SNNs的费舍尔信息，发现费舍尔信息逐渐集中在早期时间步。TRT中实现的时间衰减正则化机制能有效引导网络在信息丰富的早期时间步学习鲁棒特征，从而显著提升模型的泛化能力。