The increasing scale of neural networks needed to support more complex applications has led to an increasing requirement for area- and energy-efficient hardware. One route to meeting the budget for these applications is to circumvent the von Neumann bottleneck by performing computation in or near memory. An inevitability of transferring neural networks onto hardware is that non-idealities such as device-to-device variations or poor device yield impact performance. Methods such as hardware-aware training, where substrate non-idealities are incorporated during network training, are one way to recover performance at the cost of solution generality. In this work, we demonstrate inference on hardware neural networks consisting of 20,000 magnetic tunnel junction arrays integrated on a complementary metal-oxide-semiconductor chips that closely resembles market-ready spin transfer-torque magnetoresistive random access memory technology. Using 36 dies, each containing a crossbar array with its own non-idealities, we show that even a small number of defects in physically mapped networks significantly degrades the performance of networks trained without defects and show that, at the cost of generality, hardware-aware training accounting for specific defects on each die can recover to comparable performance with ideal networks. We then demonstrate a robust training method that extends hardware-aware training to statistics-aware training, producing network weights that perform well on most defective dies regardless of their specific defect locations. When evaluated on the 36 physical dies, statistics-aware trained solutions can achieve a mean misclassification error on the MNIST dataset that differs from the software-baseline by only 2 %. This statistics-aware training method could be generalized to networks with many layers that are mapped to hardware suited for industry-ready applications.
翻译:支持复杂应用所需的神经网络规模不断扩大,这导致对面积和能效优化硬件的需求日益增长。满足此类应用预算的一种途径是通过在存储器内或靠近存储器处执行计算来规避冯·诺依曼瓶颈。将神经网络迁移至硬件时,器件间差异或器件良率不足等非理想因素不可避免地会影响性能。硬件感知训练(即在网络训练过程中纳入衬底非理想性)是一种以牺牲方案普适性为代价恢复性能的方法。本研究展示了由集成在互补金属氧化物半导体芯片上的20,000个磁性隧道结阵列组成的硬件神经网络的推理能力,该技术高度接近可商业化的自旋转移矩磁阻随机存取存储器技术。利用36个晶粒(每个晶粒包含具有自身非理想特性的交叉阵列),我们证明:在物理映射网络中,即使是少量缺陷也会显著降低未经缺陷训练网络的性能;同时,以牺牲通用性为代价,针对每个晶粒特定缺陷的硬件感知训练可使性能恢复至接近理想网络。我们进一步提出一种鲁棒训练方法,将硬件感知训练扩展为统计感知训练,使生成的网络权重在大多数缺陷晶粒上均能表现良好,而无需考虑其具体缺陷位置。在36个物理晶粒上评估时,统计感知训练方案在MNIST数据集上的平均误分类率与软件基线仅相差2%。该统计感知训练方法可推广至映射到适合工业级应用硬件的多层网络。