SKYLIGHT: A Scalable Hundred-Channel 3D Photonic In-Memory Tensor Core Architecture for Real-time AI Inference

The growing computational demands of artificial intelligence (AI) are challenging conventional electronics, making photonic computing a promising alternative. However, existing photonic architectures face fundamental scalability and reliability barriers. This paper introduces SKYLIGHT, a scalable 3D photonic in-memory tensor core architecture designed for real-time AI inference. By co-designing its topology, wavelength routing, accumulation, and programming in a 3D stack, SKYLIGHT overcomes key limitations. Its innovations include a low-loss 3D Si/SiN crossbar topology, a thermally robust non-micro-ring resonator (MRR)-based wavelength-division multiplexing (WDM) component, a hierarchical signal accumulation using a multi-port photodetector (PD), and optically programmed non-volatile phase-change material (PCM) weights. Importantly, SKYLIGHT enables in-situ weight updates that support label-free, layer-local learning (e.g., forward-forward local updates) in addition to inference. Using SimPhony for system-level modeling, we show that a single 144 x 256 SKYLIGHT core is feasible within a single reticle and delivers 342.1 TOPS at 23.7 TOPS/W, enabling ResNet-50 inference at 1212 FPS with 27 mJ per image, and achieves 84.17 FPS/W end-to-end (1.61 x higher than an NVIDIA RTX PRO 6000 Blackwell GPU) under the same workload in real-time measurements. System-level evaluations on four representative machine learning tasks, including unsupervised local self-learning, demonstrate SKYLIGHT's robustness to realistic hardware non-idealities (low-bit quantization and signal-proportional analog noise capturing modulation, PCM programming, and readout variations). With noise-aware training, SKYLIGHT maintains high task accuracy, validating its potential as a comprehensive solution for energy-efficient, large-scale photonic AI accelerators.

翻译：人工智能（AI）日益增长的计算需求对传统电子计算提出了挑战，使得光子计算成为一种有前景的替代方案。然而，现有的光子架构面临着根本性的可扩展性和可靠性障碍。本文介绍了SKYLIGHT，一种专为实时AI推理设计的可扩展三维光子存内张量核心架构。通过在三维堆叠中协同设计其拓扑结构、波长路由、累加和编程方式，SKYLIGHT克服了关键限制。其创新包括：低损耗的三维Si/SiN交叉开关拓扑结构、基于热稳健非微环谐振器（MRR）的波分复用（WDM）组件、使用多端口光电探测器（PD）的分层信号累加，以及光学编程的非易失性相变材料（PCM）权重。重要的是，SKYLIGHT支持原位权重更新，除了推理外，还支持无标签、层本地的学习（例如前向-前向局部更新）。使用SimPhony进行系统级建模，我们证明单个144 x 256的SKYLIGHT核心在单个光罩内是可行的，并能提供342.1 TOPS的算力和23.7 TOPS/W的能效，使得ResNet-50推理达到1212 FPS，每张图像能耗为27 mJ，并且在实时测量中，相同工作负载下实现了84.17 FPS/W的端到端能效（比NVIDIA RTX PRO 6000 Blackwell GPU高1.61倍）。在四个代表性机器学习任务（包括无监督本地自学习）上的系统级评估表明，SKYLIGHT对实际硬件非理想特性（低位量化、信号比例模拟噪声捕获调制、PCM编程和读出变化）具有鲁棒性。通过噪声感知训练，SKYLIGHT保持了较高的任务精度，验证了其作为高效能、大规模光子AI加速器全面解决方案的潜力。