Towards CPU Performance Prediction: New Challenge Benchmark Dataset and Novel Approach

CPU performance prediction, which involves forecasting the performance scores of a CPU based on its hardware characteristics during its operation, is a critical technology for computational system design and resource management in the big data era. However, this research field currently faces two significant challenges. First, collecting real-world data is challenging due to the wide variety of CPU products on the market and the highly specialized nature of relevant hardware characteristics. In the research process, this field lacks a standard dataset with unified hardware characteristics, wide data coverage, and comprehensive benchmarks. Second, existing methods based on hardware simulation models or machine learning exhibit notable shortcomings, such as lengthy simulation test cycles and low prediction accuracy. To bridge these gaps, we first collect, preprocess, and standardize historical data from the 4th Generation Intel Xeon Scalable Processors across multiple benchmark suites to create a new dataset, named PerfCastDB. Subsequently, we design a deep learning based model called Nova CPU Performance Predictor (NCPP) as the baseline for this new dataset. The NCPP network is designed based on group attention mechanism. It effectively quantifies the implicit relationships between hardware characteristics within and across groups and comprehensively models the impact of various hardware characteristics on CPU performance prediction. We conduct comparative experiments using the proposed PerfCastDB dataset. Compared to existing approaches, NCPP achieves superior evaluation results, demonstrating its effectiveness. Furthermore, we have open-sourced part of the dataset and the NCPP network code to facilitate subsequent research. The resources can be accessed at https://github.com/xiaoman-liu/NCPP.

翻译：CPU性能预测是一项基于CPU硬件特性在运行期间预测其性能分数的关键技术，在大数据时代对计算系统设计与资源管理至关重要。然而，该研究领域目前面临两大挑战。首先，由于市场上CPU产品种类繁多且相关硬件特性高度专业化，收集真实世界数据十分困难。在研究过程中，该领域缺乏一个具有统一硬件特性、广泛数据覆盖和全面基准测试的标准数据集。其次，现有基于硬件仿真模型或机器学习的方法存在明显缺陷，例如仿真测试周期长、预测准确率低。为弥补这些不足，我们首先收集、预处理并标准化了第四代英特尔至强可扩展处理器在多个基准测试套件中的历史数据，构建了一个名为PerfCastDB的新数据集。随后，我们设计了一种基于深度学习的模型，称为Nova CPU性能预测器（NCPP），作为该新数据集的基线模型。NCPP网络基于分组注意力机制设计，能有效量化组内及组间硬件特性间的隐含关系，并全面建模各类硬件特性对CPU性能预测的影响。我们使用提出的PerfCastDB数据集进行了对比实验。与现有方法相比，NCPP取得了更优的评估结果，证明了其有效性。此外，我们已开源部分数据集及NCPP网络代码，以促进后续研究。相关资源可通过https://github.com/xiaoman-liu/NCPP访问。