Photonic computing is a compelling avenue for performing highly efficient matrix multiplication, a crucial operation in Deep Neural Networks (DNNs). While this method has shown great success in DNN inference, meeting the high precision demands of DNN training proves challenging due to the precision limitations imposed by costly data converters and the analog noise inherent in photonic hardware. This paper proposes Mirage, a photonic DNN training accelerator that overcomes the precision challenges in photonic hardware using the Residue Number System (RNS). RNS is a numeral system based on modular arithmetic, allowing us to perform high-precision operations via multiple low-precision modular operations. In this work, we present a novel micro-architecture and dataflow for an RNS-based photonic tensor core performing modular arithmetic in the analog domain. By combining RNS and photonics, Mirage provides high energy efficiency without compromising precision and can successfully train state-of-the-art DNNs achieving accuracy comparable to FP32 training. Our study shows that on average across several DNNs when compared to systolic arrays, Mirage achieves more than $23.8\times$ faster training and $32.1\times$ lower EDP in an iso-energy scenario and consumes $42.8\times$ lower power with comparable or better EDP in an iso-area scenario.
翻译:光子计算是实现高效矩阵乘法的一种极具吸引力的途径,而矩阵乘法是深度神经网络中的关键运算。尽管该方法在深度神经网络推理中已展现出巨大成功,但由于昂贵数据转换器带来的精度限制以及光子硬件固有的模拟噪声,要满足深度神经网络训练的高精度要求仍具挑战性。本文提出幻影,一种基于余数系统的光子深度神经网络训练加速器,它利用余数系统克服了光子硬件中的精度挑战。余数系统是一种基于模运算的数值系统,使我们能够通过多个低精度模运算来执行高精度运算。在这项工作中,我们提出了一种新颖的微体系结构和数据流,用于在模拟域中执行模运算的基于余数系统的光子张量核心。通过结合余数系统与光子技术,幻影在不牺牲精度的前提下提供了高能效,并能成功训练最先进的深度神经网络,获得与FP32训练相当的准确率。我们的研究表明,在对多种深度神经网络进行平均比较时,与脉动阵列相比,幻影在等能量场景下实现了超过$23.8\times$的训练加速和$32.1\times$的更低能耗延迟积,在等面积场景下功耗降低了$42.8\times$,同时保持相当或更优的能耗延迟积。