When trying to solve a computational problem, we are often faced with a choice between algorithms that are guaranteed to return the right answer but differ in their runtime distributions (e.g., SAT solvers, sorting algorithms). This paper aims to lay theoretical foundations for such choices by formalizing preferences over runtime distributions. It might seem that we should simply prefer the algorithm that minimizes expected runtime. However, such preferences would be driven by exactly how slow our algorithm is on bad inputs, whereas in practice we are typically willing to cut off occasional, sufficiently long runs before they finish. We propose a principled alternative, taking a utility-theoretic approach to characterize the scoring functions that describe preferences over algorithms. These functions depend on the way our value for solving our problem decreases with time and on the distribution from which captimes are drawn. We describe examples of realistic utility functions and show how to leverage a maximum-entropy approach for modeling underspecified captime distributions. Finally, we show how to efficiently estimate an algorithm's expected utility from runtime samples.
翻译:在求解计算问题时,我们常需在保证输出正确但运行时间分布不同的算法间做出选择(如SAT求解器、排序算法)。本文旨在通过形式化运行时间分布上的偏好,为这类选择奠定理论基础。直观上我们似乎应当偏好期望运行时间最小的算法,但这种偏好实际上取决于算法在恶劣输入下的具体耗时,而实践中我们通常愿意在极端长耗时出现前主动终止运行。我们提出了一种基于效用理论的规范性替代方案,通过刻画描述算法偏好的评分函数来构建框架。这些函数取决于求解问题的时间价值衰减方式以及捕获时间(captimes)的采样分布。我们描述了现实效用函数的实例,并展示了如何利用最大熵方法对欠定捕获时间分布进行建模。最后,我们演示了如何通过运行时样本高效估计算法的期望效用。