Pruning random resistive memory for optimizing analogue AI

Yi Li,Songqi Wang,Yaping Zhao,Shaocong Wang,Woyu Zhang,Yangu He,Ning Lin,Binbin Cui,Xi Chen,Shiming Zhang,Hao Jiang,Peng Lin,Xumeng Zhang,Xiaojuan Qi,Zhongrui Wang,Xiaoxin Xu,Dashan Shang,Qi Liu,Kwang-Ting Cheng,Ming Liu

The rapid advancement of artificial intelligence (AI) has been marked by the large language models exhibiting human-like intelligence. However, these models also present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing and exploits emerging analogue electronic devices, such as resistive memory, which features in-memory computing, high scalability, and nonvolatility. However, analogue computing still faces the same challenges as before: programming nonidealities and expensive programming due to the underlying devices physics. Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning to optimize the topology of a randomly weighted analogue resistive memory neural network. Software-wise, the topology of a randomly weighted neural network is optimized by pruning connections rather than precisely tuning resistive memory weights. Hardware-wise, we reveal the physical origin of the programming stochasticity using transmission electron microscopy, which is leveraged for large-scale and low-cost implementation of an overparameterized random neural network containing high-performance sub-networks. We implemented the co-design on a 40nm 256K resistive memory macro, observing 17.3% and 19.9% accuracy improvements in image and audio classification on FashionMNIST and Spoken digits datasets, as well as 9.8% (2%) improvement in PR (ROC) in image segmentation on DRIVE datasets, respectively. This is accompanied by 82.1%, 51.2%, and 99.8% improvement in energy efficiency thanks to analogue in-memory computing. By embracing the intrinsic stochasticity and in-memory computing, this work may solve the biggest obstacle of analogue computing systems and thus unleash their immense potential for next-generation AI hardware.

翻译：人工智能（AI）的快速发展以展现类人智能的大语言模型为标志。然而，这些模型也对能源消耗和环境可持续性带来了前所未有的挑战。一个有前景的解决方案是重新审视模拟计算——一种早于数字计算的技术，它利用新兴的模拟电子器件（如电阻式存储器），其具有存内计算、高可扩展性和非易失性等特点。然而，模拟计算仍面临与以往相同的挑战：由于底层器件物理特性导致的编程非理想性和高昂的编程成本。在此，我们报告一种通用解决方案，即采用基于结构可塑性启发的边缘剪枝的软硬件协同设计，来优化随机加权的模拟电阻式存储器神经网络的拓扑结构。在软件方面，通过剪枝连接而非精确调整电阻式存储器权重，优化随机加权神经网络的拓扑结构。在硬件层面，我们利用透射电子显微镜揭示了编程随机性的物理起源，并将其用于大规模、低成本地实现一个包含高性能子网络的过参数化随机神经网络。我们在一个40nm 256K电阻式存储器宏单元上实现了该协同设计，在FashionMNIST和Spoken digits数据集上的图像和音频分类任务中分别提升了17.3%和19.9%的准确率，在DRIVE数据集上的图像分割任务中，PR（ROC）指标分别提升了9.8%（2%）。得益于模拟存内计算，能效相应提升了82.1%、51.2%和99.8%。通过利用内在随机性与存内计算，这项工作或可解决模拟计算系统最大的障碍，从而释放其在下一代AI硬件中的巨大潜力。