Several microring resonator (MRR) based analog photonic architectures have been proposed to accelerate general matrix-matrix multiplications (GEMMs) in deep neural networks with exceptional throughput and energy efficiency. To implement GEMM functions, these MRR-based architectures, in general, manipulate optical signals in five different ways: (i) Splitting (copying) of multiple optical signals to achieve a certain fan-out, (ii) Aggregation (multiplexing) of multiple optical signals to achieve a certain fan-in, (iii) Modulation of optical signals to imprint input values onto analog signal amplitude, (iv) Weighting of modulated optical signals to achieve analog input-weight multiplication, (v) Summation of optical signals. The MRR-based GEMM accelerators undertake the first four ways of signal manipulation in an arbitrary order ignoring the possible impact of the order of these manipulations on their performance. In this paper, we conduct a detailed analysis of accelerator organizations with three different orders of these manipulations: (1) Modulation-Aggregation-Splitting-Weighting (MASW), (2) Aggregation-Splitting-Modulation-Weighting (ASMW), and (3) Splitting-Modulation-Weighting-Aggregation (SMWA). We show that these organizations affect the crosstalk noise and optical signal losses in different magnitudes, which renders these organizations with different levels of processing parallelism at the circuit level, and different magnitudes of throughput and energy-area efficiency at the system level. Our evaluation results for four CNN models show that SMWA organization achieves up to 4.4$\times$, 5$\times$, and 5.2$\times$ better throughput, energy efficiency, and area-energy efficiency, respectively, compared to ASMW and MASW organizations on average.
翻译:多种基于微环谐振器(MRR)的模拟光子架构已被提出,用于加速深度神经网络中的通用矩阵乘法(GEMM),并具有卓越的吞吐量和能效。为实现GEMM功能,这些基于MRR的架构通常以五种不同方式操控光信号:(i) 分割(复制)多个光信号以实现特定扇出,(ii) 聚合(复用)多个光信号以实现特定扇入,(iii) 调制光信号以将输入值编码到模拟信号幅度上,(iv) 加权调制后的光信号以实现模拟输入-权重乘法,(v) 对光信号进行求和。基于MRR的GEMM加速器以任意顺序执行前四种信号操控方式,忽略了这些操控顺序对其性能的潜在影响。本文对具有三种不同操控顺序的加速器组织进行了详细分析:(1) 调制-聚合-分割-加权(MASW),(2) 聚合-分割-调制-加权(ASMW),以及(3) 分割-调制-加权-聚合(SMWA)。我们证明,这些组织对串扰噪声和光信号损耗产生不同幅度的影响,从而导致电路层面具有不同级别的处理并行性,以及系统层面具有不同幅度的吞吐量和能量-面积效率。我们对四个CNN模型的评估结果表明,与ASMW和MASW组织相比,SMWA组织平均可实现高达4.4倍、5倍和5.2倍的吞吐量、能效和面积-能效提升。