An experimental comparison of two or more optimization algorithms requires the same computational resources to be assigned to each algorithm. When a maximum runtime is set as the stopping criterion, all algorithms need to be executed in the same machine if they are to use the same resources. Unfortunately, the implementation code of the algorithms is not always available, which means that running the algorithms to be compared in the same machine is not always possible. And even if they are available, some optimization algorithms might be costly to run, such as training large neural-networks in the cloud. In this paper, we consider the following problem: how do we compare the performance of a new optimization algorithm B with a known algorithm A in the literature if we only have the results (the objective values) and the runtime in each instance of algorithm A? Particularly, we present a methodology that enables a statistical analysis of the performance of algorithms executed in different machines. The proposed methodology has two parts. Firstly, we propose a model that, given the runtime of an algorithm in a machine, estimates the runtime of the same algorithm in another machine. This model can be adjusted so that the probability of estimating a runtime longer than what it should be is arbitrarily low. Secondly, we introduce an adaptation of the one-sided sign test that uses a modified \textit{p}-value and takes into account that probability. Such adaptation avoids increasing the probability of type I error associated with executing algorithms A and B in different machines.
翻译:对两种或多种优化算法进行实验比较时,需要为每种算法分配相同的计算资源。若以最大运行时间作为停止准则,则所有算法必须在同一台机器上执行才能使用相同资源。然而,算法实现代码并非总是可用,这意味着无法始终在同一台机器上运行待比较的算法。即便代码可用,某些优化算法(如云端训练大型神经网络)的运行成本也可能很高。本文探讨以下问题:当我们仅已知文献中算法A的运行结果(目标值)及其在每个实例上的运行时间时,如何将新优化算法B的性能与算法A进行比较?具体而言,我们提出一种能够对在不同机器上执行的算法性能进行统计分析的方法。该方法包含两个部分:首先,我们提出一个模型,该模型根据算法在一台机器上的运行时间,估算同一算法在另一台机器上的运行时间。该模型可调整,使得估算运行时间超过实际时间的概率任意低。其次,我们引入一种改进的单侧符号检验方法,该方法使用修正的\textit{p}值并将该概率纳入考量。这种改进避免了因在不同机器上执行算法A和B而增加第一类错误概率的问题。