AI systems on edge devices require online continual learning -- adapting to non-stationary streams and unfamiliar classes without catastrophic forgetting -- under strict power constraints. We present CLP-SNN, a spiking neural network with a self-normalizing local learning rule and a spike-driven neural state machine for autonomous on-chip learning, implemented on Intel's Loihi 2 neuromorphic processor. On OpenLORIS few-shot experiments, CLP-SNN matches replay-based accuracy rehearsal-free. On Loihi 2, CLP-SNN achieves 113x lower latency (0.33 ms vs. 37.3 ms) and 6,600x lower energy (0.05 mJ vs. 333 mJ) than the strongest edge-GPU baseline. This gain decomposes into algorithmic efficiency (~14.5x latency, ~22.6x energy on the same GPU) and neuromorphic hardware co-design (~7.8x latency, ~295x energy) exploiting event-driven learning and sparse graded-spike communication. We show that co-designed brain-inspired algorithms and neuromorphic hardware can break traditional accuracy-efficiency trade-offs in edge AI.
翻译:边缘设备上的人工智能系统需要在严格功耗约束下实现在线持续学习——即在不发生灾难性遗忘的前提下适应非平稳数据流和陌生类别。我们提出了CLP-SNN,这是一种具有自归一化局部学习规则和脉冲驱动神经状态机的脉冲神经网络,可在英特尔Loihi 2神经形态处理器上实现自主片上学习。在OpenLORIS小样本实验中,CLP-SNN在无重放的情况下达到了与基于重放方法相当的准确率。在Loihi 2上,CLP-SNN相比最强的边缘GPU基线实现了113倍更低的延迟(0.33毫秒对比37.3毫秒)和6600倍更低的能耗(0.05毫焦对比333毫焦)。这一增益分解为算法效率(相同GPU上延迟提升约14.5倍,能耗提升约22.6倍)与神经形态硬件协同设计(延迟提升约7.8倍,能耗提升约295倍),后者利用了事件驱动学习和稀疏分级脉冲通信。我们证明了协同设计的类脑算法与神经形态硬件能够打破边缘AI中传统的准确率-效率权衡。