Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann architectures to their limits due to the high latency and energy consumption costs associated with data movement between the processor and memory for these workloads. One of the solutions to overcome this bottleneck is to perform computation within the main memory through processing-in-memory (PIM), thereby limiting data movement and the costs associated with it. However, DRAM-based PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks and the need for frequent refresh operations. In this work, we introduce OPIMA, a PIM-based ML accelerator, architected within an optical main memory. OPIMA has been designed to leverage the inherent massive parallelism within main memory while performing high-speed, low-energy optical computation to accelerate ML models based on convolutional neural networks. We present a comprehensive analysis of OPIMA to guide design choices and operational mechanisms. Additionally, we evaluate the performance and energy consumption of OPIMA, comparing it with conventional electronic computing systems and emerging photonic PIM architectures. The experimental results show that OPIMA can achieve 2.98x higher throughput and 137x better energy efficiency than the best-known prior work.
翻译:机器学习(ML)领域的最新进展凸显了对能够弥合内存带宽与处理能力之间鸿沟的计算架构的迫切需求。深度神经网络的出现将传统的冯·诺依曼架构推向了极限,因为在这类计算任务中,处理器与内存之间的数据移动会带来高延迟和高能耗开销。克服这一瓶颈的解决方案之一是通过内存处理(PIM)在主存内部执行计算,从而限制数据移动及其相关开销。然而,基于DRAM的PIM由于内部数据移动瓶颈以及需要频繁的刷新操作,难以实现高吞吐量和能效。本文中,我们提出了OPIMA,一种基于PIM的机器学习加速器,其架构构建于光学主存之内。OPIMA旨在利用主存固有的海量并行性,同时执行高速、低能耗的光学计算,以加速基于卷积神经网络的机器学习模型。我们提供了对OPIMA的全面分析,以指导其设计选择和运行机制。此外,我们评估了OPIMA的性能和能耗,并将其与传统的电子计算系统以及新兴的光子PIM架构进行了比较。实验结果表明,与已知的最佳先前工作相比,OPIMA能够实现2.98倍的吞吐量提升和137倍的能效提升。