Transformer-based networks have emerged as prominent AI models with state-of-the-art performance, which potentially pave the way toward artificial general intelligence (AGI). However, their large sizes still hinder their efficient implementation, thus highlighting the need for alternate solutions to enable their energy-efficient acceleration. Recently, state-of-the-art works propose photonic transformer accelerators (PTAs) with significant speedup and energy efficiency improvements over the conventional electronic accelerators. However, their PTA architectures are developed without considering the application constraints (e.g., area, power, energy, and latency). Moreover, their manual design approach also requires huge design time to determine a suitable architecture for the targeted application, hence making this approach not scalable. To address these limitations, we propose DxPTA, a novel design space exploration methodology for enabling efficient hardware/software co-design of the appropriate PTA architecture that meets all constraints. It is achieved by (1) identifying the PTA architecture parameters based on the coherent optical dataflow; (2) analyzing the impact/significance of the parameters; and (3) leveraging this analysis for devising a constraint-aware architecture search algorithm. Experimental results show that, our DxPTA can find the appropriate PTA architectures for different transformer-based models (i.e., DeiT-T/S/B and BERT-B/L). It achieves up to 26mm^2 area, 4.8W power, 39mJ energy, and 6ms latency, for constraints of 50mm^2 area, 5W power, 50mJ energy, and 10ms latency; with 15.2x faster searching time than the exhaustive approach. These results demonstrate the potential of DxPTA methodology for enabling efficient PTA designs for diverse AGI-based applications.
翻译:基于Transformer的网络已成为具有最先进性能的杰出AI模型,为通向通用人工智能(AGI)铺平了道路。然而,其庞大的规模仍阻碍着高效实现,亟需替代方案以实现节能加速。近期,前沿研究提出了光子Transformer加速器(PTA),相较于传统电子加速器在速度和能效方面均有显著提升。然而,现有PTA架构设计未考虑应用约束(如面积、功耗、能耗和延迟),且人工设计方法需要巨大设计时间才能确定适合目标应用的架构,导致方法不可扩展。为解决上述局限,我们提出DxPTA——一种新型设计空间探索方法,用于实现满足所有约束的合适PTA架构的高效软硬件协同设计。该方法通过以下步骤实现:(1)基于相干光数据流识别PTA架构参数;(2)分析参数影响及重要性;(3)利用该分析结果设计约束感知架构搜索算法。实验结果表明,针对不同Transformer模型(即DeiT-T/S/B和BERT-B/L),DxPTA能寻找到合适的PTA架构。在50mm²面积、5W功耗、50mJ能耗和10ms延迟的约束下,分别实现最高26mm²面积、4.8W功耗、39mJ能耗和6ms延迟,且搜索速度比穷举法快15.2倍。这些结果证明DxPTA方法能够为各类基于AGI的应用实现高效PTA设计。