Recently, in-memory analog matrix computing (AMC) with nonvolatile resistive memory has been developed for solving matrix problems in one step, e.g., matrix inversion of solving linear systems. However, the analog nature sets up a barrier to the scalability of AMC, due to the limits on the manufacturability and yield of resistive memory arrays, non-idealities of device and circuit, and cost of hardware implementations. Aiming to deliver a scalable AMC approach for solving linear systems, this work presents BlockAMC, which partitions a large original matrix into smaller ones on different memory arrays. A macro is designed to perform matrix inversion and matrix-vector multiplication with the block matrices, obtaining the partial solutions to recover the original solution. The size of block matrices can be exponentially reduced by performing multiple stages of divide-and-conquer, resulting in a two-stage solver design that enhances the scalability of this approach. BlockAMC is also advantageous in alleviating the accuracy issue of AMC, especially in the presence of device and circuit non-idealities, such as conductance variations and interconnect resistances. Compared to a single AMC circuit solving the same problem, BlockAMC improves the area and energy efficiency by 48.83% and 40%, respectively.
翻译:近年来,基于非易失性阻变存储器的存内模拟矩阵计算(AMC)技术已被用于一步求解矩阵问题(如求解线性系统的矩阵求逆)。然而,由于阻变存储器阵列的可制造性与良率限制、器件及电路的非理想特性以及硬件实现成本等因素,模拟计算的固有特性对AMC的可扩展性构成了障碍。为实现可扩展的线性系统求解AMC方法,本文提出BlockAMC,将大规模原始矩阵分割为多个更小的子矩阵,并分配至不同存储阵列。通过设计专用宏单元,利用分块矩阵执行矩阵求逆与矩阵向量乘法运算,获取部分解以还原原始解。采用多级分治策略可使分块矩阵规模呈指数级缩减,形成增强可扩展性的两阶段求解器架构。BlockAMC还能有效缓解AMC的精度问题,尤其在面临电导变化和互连电阻等器件/电路非理想特性时表现突出。与求解同一问题的单一AMC电路相比,BlockAMC的面积效率与能量效率分别提升48.83%和40%。