Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accelerator

from arxiv, Published as a conference paper in HPCA 2024. Recieved the Reproducibility Badges at IEEE. Our implementation is available at https://github.com/zhuhanqing/Lightening-Transformer

The wide adoption and significant computing resource of attention-based transformers, e.g., Vision Transformers and large language models (LLM), have driven the demand for efficient hardware accelerators. There is a growing interest in exploring photonics as an alternative technology to digital electronics due to its high energy efficiency and ultra-fast processing speed. Photonic accelerators have shown promising results for CNNs, which mainly rely on weight-static linear operations. However, they encounter issues when efficiently supporting Transformer architectures, questioning the applicability of photonics to advanced ML tasks. The primary hurdle lies in their inefficiency in handling unique workloads in Transformers, i.e., dynamic and full-range tensor multiplication. In this work, we propose Lightening-Transformer, the first light-empowered, high-performance, and energy-efficient photonic Transformer accelerator. To overcome prior designs' fundamental limitations, we introduce a novel dynamically-operated photonic tensor core, DPTC, a crossbar array of interference-based optical vector dot-product engines supporting highly parallel, dynamic, and full-range matrix multiplication. Furthermore, we design a dedicated accelerator that integrates our novel photonic computing cores with photonic interconnects for inter-core data broadcast, fully unleashing the power of optics. Comprehensive evaluations show that ours achieves >2.6x energy and >12x latency reductions compared to prior photonic accelerators and delivers the lowest energy cost and 2 to 3 orders of magnitude lower energy-delay product compared to electronic Transformer accelerators, all while maintaining digital-comparable accuracy. Our work highlights the immense potential of photonics for advanced ML workloads, such as Transformer-backboned LLM. Our work is available at https://github.com/zhuhanqing/Lightening-Transformer.

翻译：基于注意力机制的Transformer（如视觉Transformer和大语言模型）的广泛应用及其对计算资源的巨大需求，推动了高效硬件加速器的发展。由于光子学具有高能效和超快处理速度的特性，人们对其作为数字电子替代技术的兴趣日益增长。光子加速器已在主要依赖于权重静态线性运算的CNN上展现出令人期待的结果，但在高效支撑Transformer架构时却面临挑战，这引发了光子学在先进机器学习任务中适用性的疑问。主要障碍在于其难以高效处理Transformer中的独特工作负载，即动态全范围张量乘法。在本工作中，我们提出闪电Transformer——首个光赋能、高性能、高能效的光子Transformer加速器。为突破先前设计的基本限制，我们引入一种新型动态运行的光子张量核心DPTC，该核心采用基于干涉的光矢量点积引擎交叉阵列，支持高度并行、动态全范围的矩阵乘法。此外，我们设计了一款专用加速器，将新型光子计算核心与用于核心间数据广播的光子互连集成，充分释放光学潜能。全面评估表明，与先前光子加速器相比，本方案实现了>2.6倍的能耗降低和>12倍的延迟缩减；与电子Transformer加速器相比，在保持与数字方案相当精度的前提下，实现了最低能耗成本及2至3个数量级的能耗-延迟积降低。本工作凸显了光子学在先进机器学习工作负载（如基于Transformer的大语言模型）中的巨大潜力。相关代码已开源至https://github.com/zhuhanqing/Lightening-Transformer。