The remarkable growth and significant success of machine learning have expanded its applications into programming languages and program analysis. However, a key challenge in adopting the latest machine learning methods is the representation of programming languages, which directly impacts the ability of machine learning methods to reason about programs. The absence of numerical awareness, aggregate data structure information, and improper way of presenting variables in previous representation works have limited their performances. To overcome the limitations and challenges of current program representations, we propose a graph-based program representation called PERFOGRAPH. PERFOGRAPH can capture numerical information and the aggregate data structure by introducing new nodes and edges. Furthermore, we propose an adapted embedding method to incorporate numerical awareness. These enhancements make PERFOGRAPH a highly flexible and scalable representation that effectively captures programs intricate dependencies and semantics. Consequently, it serves as a powerful tool for various applications such as program analysis, performance optimization, and parallelism discovery. Our experimental results demonstrate that PERFOGRAPH outperforms existing representations and sets new state-of-the-art results by reducing the error rate by 7.4% (AMD dataset) and 10% (NVIDIA dataset) in the well-known Device Mapping challenge. It also sets new state-of-the-art results in various performance optimization tasks like Parallelism Discovery and NUMA and Prefetchers Configuration prediction.
翻译:机器学习领域的显著增长与重大成功使其应用拓展至编程语言和程序分析领域。然而,采用最新机器学习方法的关键挑战在于编程语言的表示方式,这直接影响机器学习方法对程序进行推理的能力。以往表示研究中存在的数值感知缺失、聚合数据结构信息不足以及变量呈现方式不当等问题,限制了其性能表现。为克服当前程序表示的局限与挑战,我们提出一种名为PERFOGRAPH的基于图的程序表示。PERFOGRAPH通过引入新节点和边,能够捕获数值信息与聚合数据结构。此外,我们提出一种适配的嵌入方法以实现数值感知。这些改进使PERFOGRAPH成为一种高度灵活且可扩展的表示方法,能够有效捕获程序复杂的依赖关系与语义。因此,它在程序分析、性能优化和并行性发现等各类应用中发挥强大作用。实验结果表明,PERFOGRAPH优于现有表示,并在著名的设备映射挑战中分别将AMD数据集和NVIDIA数据集的错误率降低7.4%和10%,创下新的最优结果。同时,它在并行性发现、NUMA与预取器配置预测等多项性能优化任务中也达到了新的最优水平。