A Compute and Communication Runtime Model for Loihi 2

Neuromorphic computers hold the potential to vastly improve the speed and efficiency of a wide range of computational kernels with their asynchronous, compute-memory co-located, spatially distributed, and scalable nature. However, performance models that are simple yet sufficiently expressive to predict runtime on actual neuromorphic hardware are lacking, posing a challenge for researchers and developers who strive to design fast algorithms and kernels. As breaking the memory bandwidth wall of conventional von-Neumann architectures is a primary neuromorphic advantage, modeling communication time is especially important. At the same time, modeling communication time is difficult, as complex congestion patterns arise in a heavily-loaded Network-on-Chip. In this work, we introduce the first max-affine lower-bound runtime model -- a multi-dimensional roofline model -- for Intel's Loihi 2 neuromorphic chip that quantitatively accounts for both compute and communication based on a suite of microbenchmarks. Despite being a lower-bound model, we observe a tight correspondence (Pearson correlation coefficient greater than or equal to 0.97) between our model's estimated runtime and the measured runtime on Loihi 2 for a neural network linear layer, i.e., matrix-vector multiplication, and for an example application, a Quadratic Unconstrained Binary Optimization solver. Furthermore, we derive analytical expressions for communication-bottlenecked runtime to study scalability of the linear layer, revealing an area-runtime tradeoff for different spatial workload configurations with linear to superliner runtime scaling in layer size with a variety of constant factors. Our max-affine runtime model helps empower the design of high-speed algorithms and kernels for Loihi 2.

翻译：神经形态计算机凭借其异步、计算与内存共置、空间分布及可扩展的特性，有望大幅提升多种计算核心的速度与效率。然而，目前尚缺乏既简单又足够表达力、能预测实际神经形态硬件运行时的性能模型，这给致力于设计快速算法与核心的研究人员和开发者带来了挑战。由于突破传统冯·诺依曼架构的内存带宽墙是神经形态计算的主要优势之一，对通信时间的建模尤为重要。同时，通信时间建模十分困难，因为在高负载的片上网络中会出现复杂的拥塞模式。在本工作中，我们为英特尔Loihi 2神经形态芯片引入了首个最大仿射下界运行时模型——一种多维屋顶线模型——该模型基于一组微基准测试，定量地涵盖了计算与通信两方面。尽管是一个下界模型，我们观察到该模型对神经网络线性层（即矩阵-向量乘法）以及一个示例应用（二次无约束二进制优化求解器）的估计运行时与Loihi 2上的实测运行时之间具有紧密的一致性（皮尔逊相关系数大于等于0.97）。此外，我们推导了通信瓶颈运行时的解析表达式，以研究线性层的可扩展性，揭示了不同空间工作负载配置下的面积-运行时权衡，其运行时随层规模呈线性至超线性增长，并具有多种常数因子。我们的最大仿射运行时模型有助于赋能面向Loihi 2的高速算法与核心的设计。