The growing computational demands of artificial intelligence (AI) are challenging conventional electronics, making photonic computing a promising alternative. However, existing photonic architectures face fundamental scalability and reliability barriers. This paper introduces SKYLIGHT, a scalable 3D photonic in-memory tensor core architecture designed for real-time AI inference. By co-designing its topology, wavelength routing, accumulation, and programming in a 3D stack, SKYLIGHT overcomes key limitations. Its innovations include a low-loss 3D Si/SiN crossbar topology, a thermally robust non-micro-ring resonator (MRR)-based wavelength-division multiplexing (WDM) component, a hierarchical signal accumulation using a multi-port photodetector (PD), and optically programmed non-volatile phase-change material (PCM) weights. Importantly, SKYLIGHT enables in-situ weight updates that support label-free, layer-local learning (e.g., forward-forward local updates) in addition to inference. Using SimPhony for system-level modeling, we show that a single 144 x 256 SKYLIGHT core is feasible within a single reticle and delivers 342.1 TOPS at 23.7 TOPS/W, enabling ResNet-50 inference at 1212 FPS with 27 mJ per image, and achieves 84.17 FPS/W end-to-end (1.61 x higher than an NVIDIA RTX PRO 6000 Blackwell GPU) under the same workload in real-time measurements. System-level evaluations on four representative machine learning tasks, including unsupervised local self-learning, demonstrate SKYLIGHT's robustness to realistic hardware non-idealities (low-bit quantization and signal-proportional analog noise capturing modulation, PCM programming, and readout variations). With noise-aware training, SKYLIGHT maintains high task accuracy, validating its potential as a comprehensive solution for energy-efficient, large-scale photonic AI accelerators.
翻译:人工智能(AI)日益增长的计算需求对传统电子计算提出了挑战,使得光子计算成为一种有前景的替代方案。然而,现有的光子架构面临着根本性的可扩展性和可靠性障碍。本文介绍了SKYLIGHT,一种专为实时AI推理设计的可扩展三维光子存内张量核心架构。通过在三维堆叠中协同设计其拓扑结构、波长路由、累加和编程方式,SKYLIGHT克服了关键限制。其创新包括:低损耗的三维Si/SiN交叉开关拓扑结构、基于热稳健非微环谐振器(MRR)的波分复用(WDM)组件、使用多端口光电探测器(PD)的分层信号累加,以及光学编程的非易失性相变材料(PCM)权重。重要的是,SKYLIGHT支持原位权重更新,除了推理外,还支持无标签、层本地的学习(例如前向-前向局部更新)。使用SimPhony进行系统级建模,我们证明单个144 x 256的SKYLIGHT核心在单个光罩内是可行的,并能提供342.1 TOPS的算力和23.7 TOPS/W的能效,使得ResNet-50推理达到1212 FPS,每张图像能耗为27 mJ,并且在实时测量中,相同工作负载下实现了84.17 FPS/W的端到端能效(比NVIDIA RTX PRO 6000 Blackwell GPU高1.61倍)。在四个代表性机器学习任务(包括无监督本地自学习)上的系统级评估表明,SKYLIGHT对实际硬件非理想特性(低位量化、信号比例模拟噪声捕获调制、PCM编程和读出变化)具有鲁棒性。通过噪声感知训练,SKYLIGHT保持了较高的任务精度,验证了其作为高效能、大规模光子AI加速器全面解决方案的潜力。