Computing-in-memory (CIM) is an emerging computing paradigm, offering noteworthy potential for accelerating neural networks with high parallelism, low latency, and energy efficiency compared to conventional von Neumann architectures. However, existing research has primarily focused on hardware architecture and network co-design for large-scale neural networks, without considering resource constraints. In this study, we aim to develop edge-friendly deep neural networks (DNNs) for accelerators based on resistive random-access memory (RRAM). To achieve this, we propose an edge compilation and resource-constrained RRAM-aware neural architecture search (NAS) framework to search for optimized neural networks meeting specific hardware constraints. Our compilation approach integrates layer partitioning, duplication, and network packing to maximize the utilization of computation units. The resulting network architecture can be optimized for either high accuracy or low latency using a one-shot neural network approach with Pareto optimality achieved through the Non-dominated Sorted Genetic Algorithm II (NSGA-II). The compilation of mobile-friendly networks, like Squeezenet and MobilenetV3 small can achieve over 80% of utilization and over 6x speedup compared to ISAAC-like framework with different crossbar resources. The resulting model from NAS optimized for speed achieved 5x-30x speedup. The code for this paper is available at https://github.com/ArChiiii/rram_nas_comp_pack.
翻译:内存计算是一种新兴的计算范式,与传统的冯·诺依曼架构相比,其通过高并行性、低延迟和高能效为神经网络加速提供了显著潜力。然而,现有研究主要集中于面向大规模神经网络的硬件架构与网络协同设计,未充分考虑资源约束。在本研究中,我们旨在为基于阻变随机存取存储器的加速器开发边缘友好的深度神经网络。为此,我们提出了一个边缘编译与资源受限的RRAM感知神经架构搜索框架,以搜索满足特定硬件约束的优化神经网络。我们的编译方法集成了层划分、复制和网络打包,以最大化计算单元的利用率。通过采用一次性神经网络方法,并利用非支配排序遗传算法II实现帕累托最优,所得网络架构可针对高精度或低延迟进行优化。对于移动友好型网络(如Squeezenet和MobilenetV3 small)的编译,在不同交叉阵列资源下,相较于类ISAAC框架,其利用率可超过80%,加速比超过6倍。经NAS优化后专注于速度的模型实现了5倍至30倍的加速。本文代码发布于 https://github.com/ArChiiii/rram_nas_comp_pack。