Several microring resonator (MRR) based analog photonic architectures have been proposed to accelerate general matrix-matrix multiplications (GEMMs) in deep neural networks with exceptional throughput and energy efficiency. To implement GEMM functions, these MRR-based architectures, in general, manipulate optical signals in five different ways: (i) Splitting (copying) of multiple optical signals to achieve a certain fan-out, (ii) Aggregation (multiplexing) of multiple optical signals to achieve a certain fan-in, (iii) Modulation of optical signals to imprint input values onto analog signal amplitude, (iv) Weighting of modulated optical signals to achieve analog input-weight multiplication, (v) Summation of optical signals. The MRR-based GEMM accelerators undertake the first four ways of signal manipulation in an arbitrary order ignoring the possible impact of the order of these manipulations on their performance. In this paper, we conduct a detailed analysis of accelerator organizations with three different orders of these manipulations: (1) Modulation-Aggregation-Splitting-Weighting (MASW), (2) Aggregation-Splitting-Modulation-Weighting (ASMW), and (3) Splitting-Modulation-Weighting-Aggregation (SMWA). We show that these organizations affect the crosstalk noise and optical signal losses in different magnitudes, which renders these organizations with different levels of processing parallelism at the circuit level, and different magnitudes of throughput and energy-area efficiency at the system level. Our evaluation results for four CNN models show that SMWA organization achieves up to 4.4$\times$, 5$\times$, and 5.2$\times$ better throughput, energy efficiency, and area-energy efficiency, respectively, compared to ASMW and MASW organizations on average.
翻译:多种基于微环谐振器(MRR)的模拟光子架构已被提出,用于加速深度神经网络中的通用矩阵乘法(GEMM),并具有卓越的吞吐量和能效。为实现GEMM功能,这些基于MRR的架构通常以五种不同方式操纵光信号:(i) 分裂(复制)多个光信号以实现特定扇出;(ii) 聚合(复用)多个光信号以实现特定扇入;(iii) 调制光信号以将输入值印刻到模拟信号幅度上;(iv) 对调制后的光信号进行加权以实现模拟输入-权重乘法;(v) 对光信号进行求和。基于MRR的GEMM加速器以任意顺序执行前四种信号操纵方式,忽略了这些操纵顺序对其性能的潜在影响。本文对三种不同操纵顺序的加速器组织进行了详细分析:(1) 调制-聚合-分裂-加权(MASW);(2) 聚合-分裂-调制-加权(ASMW);(3) 分裂-调制-加权-聚合(SMWA)。我们证明这些组织以不同幅度影响串扰噪声和光信号损耗,从而在电路层面赋予这些组织不同级别的处理并行性,在系统层面产生不同幅度的吞吐量和能效面积效率。针对四个CNN模型的评估结果表明,与ASMW和MASW组织相比,SMWA组织在平均吞吐量、能效和面积能效上分别提升高达4.4倍、5倍和5.2倍。