In the perpetual pursuit of performance, modern computing systems rely ever more on stateful mechanisms to accommodate the dynamics of workloads and physical environments, bolstering efficiency but confounding benchmarking and thereby the optimization of software. Indeed, by their nature, adaptive mechanisms introduce temporal dependencies between measurements and render naive estimators of individual program performance biased. Observing that rectifying such biases necessitates speculative assumptions about system dynamics, we call for prioritizing performance differentials over absolute measures and formalize software benchmarking as the decision problem of identifying the fastest program, for which relative knowledge suffices. To this end, we propose simple experiment designs admitting consistent estimators of contrasts, whereby program-specific biases cancel under tenable assumptions. These designs asymptotically yield the correct decision and afford a robust methodology for finite-budget benchmarking in stateful environments, bearing broad implications for the development of performance-sensitive software.
翻译:在追求性能的永恒历程中,现代计算系统愈发依赖有状态机制来适应工作负载和物理环境的动态变化,此举虽提升了效率,却使基准测试——进而软件优化——陷入困境。诚然,自适应机制本质上会在测量之间引入时间依赖性,导致对单个程序性能的朴素估计产生偏差。我们注意到,纠正此类偏差需要对系统动态做出推测性假设,因此主张优先考虑性能差异而非绝对度量,并将软件基准测试形式化为识别最快程序的决策问题——这一目标仅需相对知识即可达成。为此,我们提出简单的实验设计,允许对对比项进行一致性估计,在此过程中程序特定偏差在合理假设下相互抵消。这些设计能够渐进式地得出正确决策,并为有状态环境下的有限预算基准测试提供可靠方法论,对性能敏感型软件的开发具有广泛启示。