The exponential increase in complex IPs within modern SoCs, driven by Moore's Law, has created a pressing need for fast and accurate hardware-software power-performance analysis. Traditional performance simulators (such as cycle accurate simulators) are often too slow to simulate full benchmarks within a reasonable timeframe; require considerable effort for development, maintenance, and extensions; and are prone to errors, making pre-silicon performance projections and competitive analysis increasingly challenging. Prior attempts in addressing this challenge using machine learning fall short as they are either slow, inaccurate or unable to predict the performance of full benchmarks. To address these limitations, we present PAI, the first technique to accurately predict full benchmark performance without relying on detailed simulation or instruction-wise encoding. At the heart of PAI is a hierarchical Long Short Term Memory (LSTM)-based model that takes a trace of microarchitecture independent features from a program execution and predicts performance metrics. We present the detailed design, implementation and evaluation of PAI. Our initial experiments showed that PAI can achieve an average IPC prediction error of 9.35% for SPEC CPU 2017 benchmark suite while taking only 2 min 57 sec for the entire suite. This prediction error is comparable to prior state-of-the-art techniques while requiring 3 orders of magnitude less time.
翻译:随着摩尔定律的驱动,现代片上系统中复杂知识产权核的指数级增长,催生了对快速且精确的软硬件功耗-性能分析的迫切需求。传统的性能模拟器(如周期精确模拟器)通常在合理时间内无法完成完整基准测试的仿真,在开发、维护和扩展方面需要大量人力,且容易出错,这使得硅前性能预测和竞争分析日益困难。此前利用机器学习解决该问题的尝试存在不足,要么速度缓慢、精度不足,要么无法预测完整基准测试的性能。为克服这些局限,我们提出了PAI——首个无需依赖详细仿真或指令级编码即可精确预测完整基准测试性能的技术。PAI的核心是一个基于分层长短期记忆网络的模型,该模型从程序执行中提取与微架构无关的特征轨迹,并预测性能指标。我们详细阐述了PAI的设计、实现与评估。初步实验表明,PAI对SPEC CPU 2017基准测试套件可实现平均IPC预测误差为9.35%,且整个套件仅需2分57秒。这一预测误差与先前最先进的技术相当,但所需时间减少了三个数量级。