The Standard Performance Evaluation Corporation (SPEC) CPU benchmark has been widely used as a measure of computing performance for decades. The SPEC is an industry-standardized, CPU-intensive benchmark suite and the collective data provide a proxy for the history of worldwide CPU and system performance. Past efforts have not provided or enabled answers to questions such as, how has the SPEC benchmark suite evolved empirically over time and what micro-architecture artifacts have had the most influence on performance? -- have any micro-benchmarks within the suite had undue influence on the results and comparisons among the codes? -- can the answers to these questions provide insights to the future of computer system performance? To answer these questions, we detail our historical and statistical analysis of specific hardware artifacts (clock frequencies, core counts, etc.) on the performance of the SPEC benchmarks since 1995. We discuss in detail several methods to normalize across benchmark evolutions. We perform both isolated and collective sensitivity analyses for various hardware artifacts and we identify one benchmark (libquantum) that had somewhat undue influence on performance outcomes. We also present the use of SPEC data to predict future performance.
翻译:标准性能评估公司(SPEC)CPU基准测试数十年来被广泛用作计算性能的度量标准。SPEC是一套行业标准化的CPU密集型基准测试套件,其集体数据为全球CPU与系统性能演变的代理指标提供了历史参照。然而,过往研究尚未解答以下问题:SPEC基准套件随时间的推移在经验层面如何演进?哪些微架构伪影对性能影响最大?套件中的微基准测试是否对结果及代码间的对比产生不当影响?这些问题的答案能否为计算机系统性能的未来提供洞见?为回答这些问题,我们自1995年起对特定硬件伪影(时钟频率、核心数量等)对SPEC基准性能的影响进行了详细的历史与统计分析。我们详细讨论了多种跨基准测试演进的归一化方法,并对各类硬件伪影展开了独立与联合敏感性分析,同时识别出libquantum这一对性能结果具有不当影响的基准测试。此外,我们还演示了如何利用SPEC数据预测未来性能。