CPU Simulation Using Two-Phase Stratified Sampling

Simulation remains a cornerstone of computer architecture research, yet full end-to-end application execution is prohibitively time-consuming. The industry-standard solution, SimPoint, mitigates this cost by selecting a small number of representative code regions that capture program phase behavior. In this work, we take a fresh look at phase behavior in the SPEC CPU 2017 Integer suite to assess how pronounced such behavior truly is and what accuracy can be expected from typical SimPoint usage. Based on previously published data, we argue that common SimPoint counts can induce substantial estimation errors. To explore this further, we recast SimPoint as a stratified sampling problem, which enables the derivation of a conservative confidence interval. The analysis indicates that significant errors are expected, and our empirical analysis confirms this: with 20 SimPoints, two applications exhibit 40-60% performance prediction error. We decompose SimPoint into its two fundamental components - stratification (clustering) and sample-unit selection (centroid choice) - and analyze their individual effects on accuracy. We then extend the approach into a two-phase (double) sampling scheme, in which a large preliminary random sample enables improved stratification and more representative region selection. Using this method, the maximum per-application error drops to 3%. Finally, we demonstrate that the proposed two-phase stratified framework achieves an order-of-magnitude reduction in required sample size compared to simple random sampling while maintaining a tight analytical confidence interval, suggesting a practical path toward statistically grounded and efficient architectural simulation.

翻译：模拟仍是计算机体系结构研究的基石，但完整的端到端应用执行过于耗时。业界标准解决方案SimPoint通过选取少量能捕捉程序阶段行为的代表性代码区域来缓解这一问题。本文重新审视SPEC CPU 2017整数套件中的阶段行为，评估其显著程度以及典型SimPoint使用可达到的精度。基于已发表数据，我们论证常见SimPoint计数可能引发显著估计误差。为深入探究此问题，我们将SimPoint重新建模为分层抽样问题，从而推导出保守置信区间。分析表明预期存在显著误差，实证分析亦证实此结论：使用20个SimPoint时，两个应用的性能预测误差达40-60%。我们将SimPoint分解为两个基本组成部分——分层（聚类）与样本单元选择（质心选取），并分别分析其对精度的影响。进而将该方法扩展为两阶段（双重）抽样方案：通过较大的初步随机样本实现改进的分层与更具代表性的区域选取。采用此方法后，每个应用的最大误差降至3%。最后，我们证明所提出的两阶段分层框架相较于简单随机抽样，在保持紧凑分析置信区间的同时将所需样本量降低一个数量级，为统计严谨且高效的体系结构模拟提供了可行路径。