The wide adoption and significant computing resource consumption of attention-based Transformers, e.g., Vision Transformer and large language models, have driven the demands for efficient hardware accelerators. While electronic accelerators have been commonly used, there is a growing interest in exploring photonics as an alternative technology due to its high energy efficiency and ultra-fast processing speed. Optical neural networks (ONNs) have demonstrated promising results for convolutional neural network (CNN) workloads that only require weight-static linear operations. However, they fail to efficiently support Transformer architectures with attention operations due to the lack of ability to process dynamic full-range tensor multiplication. In this work, we propose a customized high-performance and energy-efficient photonic Transformer accelerator, DOTA. To overcome the fundamental limitation of existing ONNs, we introduce a novel photonic tensor core, consisting of a crossbar array of interference-based optical vector dot-product engines, that supports highly-parallel, dynamic, and full-range matrix-matrix multiplication. Our comprehensive evaluation demonstrates that DOTA achieves a >4x energy and a >10x latency reduction compared to prior photonic accelerators, and delivers over 20x energy reduction and 2 to 3 orders of magnitude lower latency compared to the electronic Transformer accelerator. Our work highlights the immense potential of photonic computing for efficient hardware accelerators, particularly for advanced machine learning workloads.
翻译:基于注意力机制的Transformer(例如视觉Transformer和大语言模型)的广泛应用及其对计算资源的巨大消耗,催生了对高效硬件加速器的需求。虽然电子加速器已被普遍采用,但光子学因其高能效和超快处理速度,正作为一种替代技术受到日益关注。光学神经网络(ONNs)已在仅需静态权重线性操作的卷积神经网络(CNN)工作负载中展现出令人期待的结果。然而,由于缺乏处理动态全范围张量乘法的能力,它们无法高效支持包含注意力操作的Transformer架构。在本工作中,我们提出了一款定制化高性能且节能的光子Transformer加速器——DOTA。为克服现有ONNs的根本局限,我们引入了一种新型光子张量核心,其由基于干涉的光学向量点积引擎交叉阵列构成,可支持高度并行、动态且全范围的矩阵-矩阵乘法。综合评估表明,与先前的光子加速器相比,DOTA实现了超过4倍的能耗降低和超过10倍的延迟降低;与电子Transformer加速器相比,则实现了超过20倍的能耗降低和2至3个数量级的延迟降低。本工作凸显了光子计算在实现高效硬件加速器(尤其是针对高级机器学习工作负载)方面的巨大潜力。