The remarkable growth and significant success of machine learning have expanded its applications into programming languages and program analysis. However, a key challenge in adopting the latest machine learning methods is the representation of programming languages, which directly impacts the ability of machine learning methods to reason about programs. The absence of numerical awareness, composite data structure information, and improper way of presenting variables in previous representation works have limited their performances. To overcome the limitations and challenges of current program representations, we propose a novel graph-based program representation called PERFOGRAPH. PERFOGRAPH can capture numerical information and the composite data structure by introducing new nodes and edges. Furthermore, we propose an adapted embedding method to incorporate numerical awareness. These enhancements make PERFOGRAPH a highly flexible and scalable representation that can effectively capture program intricate dependencies and semantics. Consequently, it serves as a powerful tool for various applications such as program analysis, performance optimization, and parallelism discovery. Our experimental results demonstrate that PERFOGRAPH outperforms existing representations and sets new state-of-the-art results by reducing the error rate by 7.4% (AMD dataset) and 10% (NVIDIA dataset) in the well-known Device Mapping challenge. It also sets new state-of-the-art results in various performance optimization tasks like Parallelism Discovery and Numa and Prefetchers Configuration prediction.
翻译:机器学习的显著增长与巨大成功已将其应用扩展到编程语言与程序分析领域。然而,采用最新机器学习方法的关键挑战在于编程语言的表示方式,这直接影响机器学习方法对程序的推理能力。先前表示工作缺乏数值感知能力、复合数据结构信息,且变量呈现方式不当,限制了其性能表现。为克服当前程序表示的局限性与挑战,我们提出一种名为PERFOGRAPH的新型图结构程序表示。PERFOGRAPH通过引入新节点与边来捕获数值信息与复合数据结构。此外,我们提出一种适配嵌入方法以实现数值感知。这些增强使PERFOGRAPH成为一种高度灵活且可扩展的表示,能有效捕获程序中复杂的依赖关系与语义。因此,它可作为程序分析、性能优化与并行性发现等各类应用的强大工具。实验结果表明,PERFOGRAPH优于现有表示,并在著名的Device Mapping挑战中通过降低7.4%(AMD数据集)和10%(NVIDIA数据集)的错误率,创下新的最优结果。此外,它还在并行性发现、NUMA与预取器配置预测等多项性能优化任务中创下新的最优结果。