SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution

Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads. However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale. Sparsity provides a great opportunity for hardware-efficient AI accelerators. However, current dense photonic accelerators fail to fully exploit the power-saving potential of algorithmic sparsity. It requires sparsity-aware hardware specialization with a fundamental re-design of photonic tensor core topology and cross-layer device-circuit-architecture-algorithm co-optimization aware of hardware non-ideality and power bottleneck. To trim down the redundant power consumption while maximizing robustness to thermal variations, we propose SCATTER, a novel algorithm-circuit co-sparse photonic accelerator featuring dynamically reconfigurable signal path via thermal-tolerant, power-efficient in-situ light redistribution and power gating. A power-optimized, crosstalk-aware dynamic sparse training framework is introduced to explore row-column structured sparsity and ensure marginal accuracy loss and maximum power efficiency. The extensive evaluation shows that our cross-stacked optimized accelerator SCATTER achieves a 511X area reduction and 12.4X power saving with superior crosstalk tolerance that enables unprecedented circuit layout compactness and on-chip power efficiency.

翻译：光子计算已成为加速计算密集型人工智能（AI）工作负载的一种有前景的解决方案。然而，有限的可重构性、高昂的电光转换成本以及对热效应的敏感性，限制了当前光学模拟计算引擎大规模部署以支持功耗受限、性能敏感的AI工作负载。稀疏性为构建硬件高效的AI加速器提供了重要机遇。然而，当前密集的光子加速器未能充分利用算法稀疏性的节能潜力。这需要具备稀疏性感知的硬件专业化设计，从根本上重新设计光子张量核心拓扑结构，并开展跨层（器件-电路-架构-算法）协同优化，同时考虑硬件非理想性和功耗瓶颈。为了在最大限度提高对热变化鲁棒性的同时削减冗余功耗，我们提出了SCATTER，一种新颖的算法-电路协同稀疏光子加速器。其特点是能够通过热容错、高能效的原位光重分布和电源门控技术，实现动态可重构的信号路径。我们引入了一个功耗优化、串扰感知的动态稀疏训练框架，以探索行列结构化稀疏性，并确保精度损失最小化和能效最大化。广泛的评估表明，我们通过跨层堆叠优化的加速器SCATTER实现了511倍的面积缩减和12.4倍的功耗节省，并具有卓越的串扰容限，这使得前所未有的电路布局紧凑性和片上能效成为可能。