In recent years, domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. To facilitate DSA designs, programmers use high-level synthesis (HLS) to compile a high-level description written in C/C++ into a design with low-level hardware description languages that eventually synthesize DSAs on circuits. However, creating a high-quality HLS design still demands significant domain knowledge, particularly in microarchitecture decisions expressed as \textit{pragmas}. Thus, it is desirable to automate such decisions with the help of machine learning for predicting the quality of HLS designs, requiring a deeper understanding of the program that consists of original code and pragmas. Naturally, these programs can be considered as sequence data. In addition, these programs can be compiled and converted into a control data flow graph (CDFG). But existing works either fail to leverage both modalities or combine the two in shallow or coarse ways. We propose ProgSG, a model that allows interaction between the source code sequence modality and the graph modality in a deep and fine-grained way. To alleviate the scarcity of labeled designs, a pre-training method is proposed based on a suite of compiler's data flow analysis tasks. Experimental results show that ProgSG reduces the RMSE of design performance predictions by up to $22\%$, and identifies designs with an average of $1.10\times$ and $1.26\times$ (up to $8.17\times$ and $13.31\times$) performance improvement in design space exploration (DSE) task compared to HARP and AutoDSE, respectively.
翻译:近年来,特定领域加速器在深度学习和自动驾驶等应用中日益普及。为促进特定领域加速器的设计,程序员使用高层次综合将C/C++语言编写的高级描述编译为采用低级硬件描述语言的设计,最终在电路上综合出特定领域加速器。然而,创建高质量的高层次综合设计仍需要大量领域知识,特别是体现在\textit{编译指示}中的微架构决策。因此,期望借助机器学习预测高层次综合设计质量来自动化此类决策,这需要对由原始代码和编译指示组成的程序有更深入的理解。自然地,这些程序可被视为序列数据。此外,这些程序可被编译并转换为控制数据流图。但现有工作未能充分利用这两种模态,或以浅层或粗略的方式结合二者。我们提出ProgSG模型,该模型支持源代码序列模态与图模态之间进行深度细粒度交互。为缓解标注设计数据的稀缺性,提出一种基于编译器数据流分析任务套件的预训练方法。实验结果表明,ProgSG将设计性能预测的均方根误差降低高达$22\%$,在设计空间探索任务中,相较于HARP和AutoDSE,分别识别出平均性能提升达$1.10\times$和$1.26\times$(最高达$8.17\times$和$13.31\times$)的设计方案。