A Linear Combination-based Method to Construct Proxy Benchmarks for Big Data Workloads

During early stages of CPU design, benchmarks can only run on simulators to evaluate CPU performance. However, most big data benchmarks are too huge at code size scale, which causes them to be unable to finish running on simulators at an acceptable time cost. Moreover, big data benchmarks usually need complex software stacks to support their running, which is hard to be ported on simulators. Proxy benchmarks, without long running times and complex software stacks, have the same micro-architectural metrics as real benchmarks, which means they can represent real benchmarks' micro-architectural characteristics. Therefore, proxy benchmarks can replace real benchmarks to run on simulators to evaluate the CPU performance. The biggest challenge is how to guarantee that the proxy benchmarks have exactly the same micro-architectural metrics as real benchmarks when the number of micro-architectural metrics is very large. To deal with this challenge, we propose a linear combination-based proxy benchmark generation methodology that transforms this problem into solving a linear equation system. We also design the corresponding algorithms to ensure the linear equation is astringency, which means that although sometimes the linear equation system doesn't have a unique solution, the algorithm can find the best solution by the non-negative least square method. We generate fifteen proxy benchmarks and evaluate their running time and accuracy in comparison to corresponding real benchmarks for Mysql and RockDB. On the typical Intel Xeon platform, the average running time is 1.62s, and the average accuracy of every micro-architectural metric is over 92%, while the longest running time of real benchmarks is nearly 4 hours. We also conduct two case studies that demonstrate that our proxy benchmarks are consistent with real benchmarks both before and after prefetch or Hyper-Threading is turned on.

翻译：在CPU设计的早期阶段，基准测试只能在模拟器上运行以评估CPU性能。然而，大多数大数据基准测试的代码规模过大，导致其无法在可接受的时间成本内完成模拟器运行。此外，大数据基准测试通常需要复杂的软件栈支持其运行，而这在模拟器上难以移植。代理基准测试无需较长的运行时间和复杂的软件栈，却能与真实基准测试具有相同的微架构指标，这意味着它们能够代表真实基准测试的微架构特征。因此，代理基准测试可替代真实基准测试在模拟器上运行以评估CPU性能。最大的挑战在于：当微架构指标数量极多时，如何保证代理基准测试与真实基准测试的微架构指标完全一致。为应对这一挑战，我们提出基于线性组合的代理基准测试生成方法，将该问题转化为求解线性方程组。同时，我们设计了相应算法确保线性方程具有收敛性——即使线性方程组有时不存在唯一解，该算法也能通过非负最小二乘法找到最优解。我们生成了15个代理基准测试，并在Mysql和RockDB上将其运行时间和精度与对应真实基准测试进行对比评估。在典型Intel Xeon平台上，代理基准测试的平均运行时间为1.62秒，且每个微架构指标的平均精度超过92%，而真实基准测试的最长运行时间近4小时。我们还通过两个案例研究表明：在开启预取或超线程技术前后，我们的代理基准测试与真实基准测试均保持一致性。