Simulation remains a cornerstone of computer architecture research, yet full end-to-end application execution is prohibitively time-consuming. The industry-standard solution, SimPoint, mitigates this cost by selecting a small number of representative code regions that capture program phase behavior. In this work, we take a fresh look at phase behavior in the SPEC CPU 2017 Integer suite to assess how pronounced such behavior truly is and what accuracy can be expected from typical SimPoint usage. Based on previously published data, we argue that common SimPoint counts can induce substantial estimation errors. To explore this further, we recast SimPoint as a stratified sampling problem, which enables the derivation of a conservative confidence interval. The analysis indicates that significant errors are expected, and our empirical analysis confirms this: with 20 SimPoints, two applications exhibit 40-60% performance prediction error. We decompose SimPoint into its two fundamental components - stratification (clustering) and sample-unit selection (centroid choice) - and analyze their individual effects on accuracy. We then extend the approach into a two-phase (double) sampling scheme, in which a large preliminary random sample enables improved stratification and more representative region selection. Using this method, the maximum per-application error drops to 3%. Finally, we demonstrate that the proposed two-phase stratified framework achieves an order-of-magnitude reduction in required sample size compared to simple random sampling while maintaining a tight analytical confidence interval, suggesting a practical path toward statistically grounded and efficient architectural simulation.
翻译:暂无翻译