Developing asynchronous neuromorphic hardware to meet the demands of diverse real-life edge scenarios remains significant challenges. These challenges include constraints on hardware resources and power budgets while satisfying the requirements for real-time responsiveness, reliable inference accuracy, and so on. Besides, the existing system-level simulators for asynchronous neuromorphic hardware suffer from runtime limitations. To address these challenges, we propose an Asynchronous Neuromorphic algorithm/hardware Co-Exploration Framework (ANCoEF) including multi-objective reinforcement learning (RL)-based hardware architecture optimization method, and a fully asynchronous simulator (TrueAsync) which achieves over 2 times runtime speedups than the state-of-the-art (SOTA) simulator. Our experimental results show that, the RL-based hardware architecture optimization approach of ANCoEF outperforms the SOTA method by reducing 1.81 times hardware energy-delay product (EDP) with 2.73 times less search time on N-MNIST dataset, and the co-exploration framework of ANCoEF improves SNN accuracy by 9.72% and reduces hardware EDP by 28.85 times compared to the SOTA work on DVS128Gesture dataset. Furthermore, ANCoEF framework is evaluated on external neuromorphic dataset CIFAR10-DVS, and static datasets including CIFAR10, CIFAR100, SVHN, and Tiny-ImageNet. For instance, after 26.23 ThreadHour of co-exploration process, the result on CIFAR10-DVS dataset achieves an SNN accuracy of 98.48% while consuming hardware EDP of 0.54 s nJ per sample.
翻译:开发异步神经形态硬件以满足多样化现实边缘场景的需求仍面临重大挑战。这些挑战包括在满足实时响应性、可靠推理精度等要求的同时,还需应对硬件资源和功耗预算的限制。此外,现有的异步神经形态硬件系统级模拟器存在运行时性能瓶颈。为应对这些挑战,我们提出了一种异步神经形态算法/硬件协同探索框架(ANCoEF),该框架包含基于多目标强化学习(RL)的硬件架构优化方法,以及一个全异步模拟器(TrueAsync),其运行速度比最先进(SOTA)模拟器提升超过2倍。我们的实验结果表明,在N-MNIST数据集上,ANCoEF基于RL的硬件架构优化方法相比SOTA方法,硬件能量延迟积(EDP)降低1.81倍,且搜索时间减少2.73倍;在DVS128Gesture数据集上,ANCoEF协同探索框架相比SOTA工作,SNN精度提升9.72%,硬件EDP降低28.85倍。此外,ANCoEF框架在外部神经形态数据集CIFAR10-DVS以及静态数据集CIFAR10、CIFAR100、SVHN和Tiny-ImageNet上进行了评估。例如,经过26.23线程小时的协同探索过程,在CIFAR10-DVS数据集上实现了98.48%的SNN精度,同时每样本硬件EDP仅为0.54 s nJ。