The accuracy of neural networks has greatly improved across various domains over the past years. Their ever-increasing complexity, however, leads to prohibitively high energy demands and latency in von Neumann systems. Several computing-in-memory (CIM) systems have recently been proposed to overcome this, but trade-offs involving accuracy, hardware reliability, and scalability for large models remain a challenge. Additionally, for some CIM designs, the activation movement still requires considerable time and energy. This paper explores the combination of algorithmic optimizations for ternary weight neural networks and associative processors (APs) implemented using racetrack memory (RTM). We propose a novel compilation flow to optimize convolutions on APs by reducing their arithmetic intensity. By leveraging the benefits of RTM-based APs, this approach substantially reduces data transfers within the memory while addressing accuracy, energy efficiency, and reliability concerns. Concretely, our solution improves the energy efficiency of ResNet-18 inference on ImageNet by 7.5x compared to crossbar in-memory accelerators while retaining software accuracy.
翻译:过去几年,神经网络在多个领域的准确性大幅提升。然而,其日益增长的复杂性导致冯·诺依曼系统出现能耗过高和延迟过大的问题。近期虽有多款存内计算系统被提出以应对此挑战,但在大模型的精度、硬件可靠性和可扩展性之间仍存在权衡难题。此外,某些存内计算设计中,激活值的传输仍需消耗大量时间和能量。本文探索了三值权重神经网络与基于赛道存储器的关联处理器相结合的算法优化方案。我们提出了一种新型编译流程,通过降低算术强度来优化关联处理器上的卷积运算。该方法利用基于赛道存储器的关联处理器优势,在显著减少内存内数据传输量的同时,兼顾精度、能效和可靠性。具体而言,与交叉开关存内加速器相比,我们的方案在保持软件精度不变的前提下,将ResNet-18在ImageNet上的能效提升了7.5倍。