With the end of Moore's Law, there is a growing demand for rapid architectural innovations in modern processors, such as RISC-V custom extensions, to continue performance scaling. Program sampling is a crucial step in microprocessor design, as it selects representative simulation points for workload simulation. While SimPoint has been the de-facto approach for decades, its limited expressiveness with Basic Block Vector (BBV) requires time-consuming human tuning, often taking months, which impedes fast innovation and agile hardware development. This paper introduces Neural Program Sampling (NPS), a novel framework that learns execution embeddings using dynamic snapshots of a Graph Neural Network. NPS deploys AssemblyNet for embedding generation, leveraging an application's code structures and runtime states. AssemblyNet serves as NPS's graph model and neural architecture, capturing a program's behavior in aspects such as data computation, code path, and data flow. AssemblyNet is trained with a data prefetch task that predicts consecutive memory addresses. In the experiments, NPS outperforms SimPoint by up to 63%, reducing the average error by 38%. Additionally, NPS demonstrates strong robustness with increased accuracy, reducing the expensive accuracy tuning overhead. Furthermore, NPS shows higher accuracy and generality than the state-of-the-art GNN approach in code behavior learning, enabling the generation of high-quality execution embeddings.
翻译:随着摩尔定律的终结,现代处理器(如RISC-V自定义扩展)亟需快速架构创新以延续性能提升。程序采样是微处理器设计中的关键步骤,通过选择代表性仿真点进行工作负载模拟。尽管SimPoint作为主流方法已沿用数十年,但其基于基本块向量的有限表达能力需要耗时数月的人工调优,严重阻碍了快速创新与敏捷硬件开发。本文提出神经程序采样(NPS),一种利用图神经网络动态快照学习执行嵌入的新型框架。NPS部署AssemblyNet生成嵌入,充分挖掘应用程序的代码结构与运行时状态。AssemblyNet作为NPS的图模型与神经架构,从数据计算、代码路径及数据流等多维度捕获程序行为,并通过预测连续内存地址的数据预取任务进行训练。实验表明,NPS相比SimPoint性能提升最高达63%,平均误差降低38%。此外,NPS以更高精度展现强鲁棒性,显著降低昂贵的精度调优开销。在代码行为学习方面,NPS相较现有最先进的GNN方法具备更高准确性与泛化能力,可生成高质量的执行嵌入。