Neuromorphic computers hold the potential to vastly improve the speed and efficiency of a wide range of computational kernels with their asynchronous, compute-memory co-located, spatially distributed, and scalable nature. However, performance models that are simple yet sufficiently expressive to predict runtime on actual neuromorphic hardware are lacking, posing a challenge for researchers and developers who strive to design fast algorithms and kernels. As breaking the memory bandwidth wall of conventional von-Neumann architectures is a primary neuromorphic advantage, modeling communication time is especially important. At the same time, modeling communication time is difficult, as complex congestion patterns arise in a heavily-loaded Network-on-Chip. In this work, we introduce the first max-affine lower-bound runtime model -- a multi-dimensional roofline model -- for Intel's Loihi 2 neuromorphic chip that quantitatively accounts for both compute and communication based on a suite of microbenchmarks. Despite being a lower-bound model, we observe a tight correspondence (Pearson correlation coefficient greater than or equal to 0.97) between our model's estimated runtime and the measured runtime on Loihi 2 for a neural network linear layer, i.e., matrix-vector multiplication, and for an example application, a Quadratic Unconstrained Binary Optimization solver. Furthermore, we derive analytical expressions for communication-bottlenecked runtime to study scalability of the linear layer, revealing an area-runtime tradeoff for different spatial workload configurations with linear to superliner runtime scaling in layer size with a variety of constant factors. Our max-affine runtime model helps empower the design of high-speed algorithms and kernels for Loihi 2.
翻译:神经形态计算机凭借其异步、计算与内存共置、空间分布及可扩展的特性,有望大幅提升多种计算核心的速度与效率。然而,目前尚缺乏既简单又足够表达力、能预测实际神经形态硬件运行时的性能模型,这给致力于设计快速算法与核心的研究人员和开发者带来了挑战。由于突破传统冯·诺依曼架构的内存带宽墙是神经形态计算的主要优势之一,对通信时间的建模尤为重要。同时,通信时间建模十分困难,因为在高负载的片上网络中会出现复杂的拥塞模式。在本工作中,我们为英特尔Loihi 2神经形态芯片引入了首个最大仿射下界运行时模型——一种多维屋顶线模型——该模型基于一组微基准测试,定量地涵盖了计算与通信两方面。尽管是一个下界模型,我们观察到该模型对神经网络线性层(即矩阵-向量乘法)以及一个示例应用(二次无约束二进制优化求解器)的估计运行时与Loihi 2上的实测运行时之间具有紧密的一致性(皮尔逊相关系数大于等于0.97)。此外,我们推导了通信瓶颈运行时的解析表达式,以研究线性层的可扩展性,揭示了不同空间工作负载配置下的面积-运行时权衡,其运行时随层规模呈线性至超线性增长,并具有多种常数因子。我们的最大仿射运行时模型有助于赋能面向Loihi 2的高速算法与核心的设计。