This paper presents a tool for automatically exploring the design space of deep learning accelerators (DLAs). Our main advancement is Starlight, a data-driven performance model that uses transfer learning to bridge the gap between fast, low-fidelity evaluation methods (such as analytical models) and slow, high-fidelity evaluation methods (such as RTL simulation). Starlight is fast: It can provide 6,500 predictions per second, allowing the evaluation of millions of configurations per hour. Starlight is accurate: It predicts the energy-delay product measured by RTL simulation with 99\% accuracy. And Starlight can be trained efficiently: It can be trained with 61\% fewer samples than DOSA's state-of-the-art data-driven performance predictor. Our second contribution is Polaris, a design-space exploration tool that uses Starlight to efficiently search the large, complex hardware/software co-design space of DLAs. In under 35 minutes, Polaris produces DLA designs that match the performance of designs that take six hours to produce with DOSA. And in under 3.3 hours, Polaris produces DLA designs that reduce energy-delay product by 2.7$\times$ over the best designs found by DOSA.
翻译:本文提出了一种用于自动探索深度学习加速器设计空间的工具。我们的主要进展是Starlight,一种数据驱动的性能模型,它利用迁移学习来弥合快速低保真度评估方法(如解析模型)与缓慢高保真度评估方法(如RTL仿真)之间的差距。Starlight具有快速性:每秒可提供6,500次预测,每小时可评估数百万种配置。Starlight具有准确性:对RTL仿真测量的能量延迟积预测精度达99%。Starlight具备高效训练特性:相比DOSA最先进的数据驱动性能预测器,其训练所需样本量减少61%。我们的第二项贡献是Polaris,这是一种利用Starlight高效搜索DLA复杂硬件/软件协同设计空间的设计空间探索工具。在35分钟内,Polaris生成的DLA设计性能与DOSA耗时六小时生成的设计相当。在3.3小时内,Polaris生成的DLA设计将能量延迟积降低至DOSA最优设计的2.7倍。