As the role of artificial intelligence becomes increasingly pivotal in modern society, the efficient training and deployment of deep neural networks have emerged as critical areas of focus. Recent advancements in attention-based large neural architectures have spurred the development of AI accelerators, facilitating the training of extensive, multi-billion parameter models. Despite their effectiveness, these powerful networks often incur high execution costs in production environments. Neuromorphic computing, inspired by biological neural processes, offers a promising alternative. By utilizing temporally-sparse computations, Spiking Neural Networks (SNNs) offer to enhance energy efficiency through a reduced and low-power hardware footprint. However, the training of SNNs can be challenging due to their recurrent nature which cannot as easily leverage the massive parallelism of modern AI accelerators. To facilitate the investigation of SNN architectures and dynamics researchers have sought to bridge Python-based deep learning frameworks such as PyTorch or TensorFlow with custom-implemented compute kernels. This paper introduces Spyx, a new and lightweight SNN simulation and optimization library designed in JAX. By pre-staging data in the expansive vRAM of contemporary accelerators and employing extensive JIT compilation, Spyx allows for SNN optimization to be executed as a unified, low-level program on NVIDIA GPUs or Google TPUs. This approach achieves optimal hardware utilization, surpassing the performance of many existing SNN training frameworks while maintaining considerable flexibility.
翻译:随着人工智能在现代社会中日益重要的作用,深度神经网络的高效训练与部署已成为关键研究领域。近期基于注意力机制的大型神经架构的进展推动了AI加速器的发展,促进了数十亿参数级大规模模型的训练。尽管这些强大网络效果显著,但在生产环境中往往产生高昂的执行成本。受生物神经过程启发的神经形态计算提供了一种有前景的替代方案。通过利用时间稀疏计算,脉冲神经网络(SNNs)有望通过降低且低功耗的硬件占用提升能效。然而,由于SNNs具有循环特性,难以充分利用现代AI加速器的大规模并行能力,其训练颇具挑战性。为促进SNN架构与动力学研究,研究人员尝试将基于Python的深度学习框架(如PyTorch或TensorFlow)与自定义实现的计算内核相结合。本文介绍Spyx,一种基于JAX设计的新型轻量级SNN仿真与优化库。通过在现代加速器的大容量vRAM中预置数据并采用广泛JIT编译,Spyx能够将SNN优化作为统一的底层程序在NVIDIA GPU或Google TPU上执行。该方法在保持高度灵活性的同时实现了最优硬件利用率,性能超越现有多种SNN训练框架。