An experimental comparison of two or more optimization algorithms requires the same computational resources to be assigned to each algorithm. When a maximum runtime is set as the stopping criterion, all algorithms need to be executed in the same machine if they are to use the same resources. Unfortunately, the implementation code of the algorithms is not always available, which means that running the algorithms to be compared in the same machine is not always possible. And even if they are available, some optimization algorithms might be costly to run, such as training large neural-networks in the cloud. In this paper, we consider the following problem: how do we compare the performance of a new optimization algorithm B with a known algorithm A in the literature if we only have the results (the objective values) and the runtime in each instance of algorithm A? Particularly, we present a methodology that enables a statistical analysis of the performance of algorithms executed in different machines. The proposed methodology has two parts. First, we propose a model that, given the runtime of an algorithm in a machine, estimates the runtime of the same algorithm in another machine. This model can be adjusted so that the probability of estimating a runtime longer than what it should be is arbitrarily low. Second, we introduce an adaptation of the one-sided sign test that uses a modified p-value and takes into account that probability. Such adaptation avoids increasing the probability of type I error associated with executing algorithms A and B in different machines.
翻译:对两种或多种优化算法进行实验比较时,需要为每种算法分配相同的计算资源。当以最大运行时间作为停止准则时,若要确保各算法使用相同资源,则所有算法必须在同一台机器上执行。然而,算法实现代码并非总是公开可用,这意味着无法始终让待比较算法在同一台机器上运行。即便代码可用,某些优化算法(例如云端大规模神经网络训练)可能运行成本高昂。本文探讨以下问题:若已知文献中算法A在各类实例上的运行结果(目标函数值)与运行时间,如何将其与新算法B进行性能对比?特别地,我们提出了一种能够对不同机器上执行的算法性能进行统计分析的方法。该方法包含两部分:首先,我们建立了一个模型,该模型可根据算法在某台机器上的运行时间,估算其另一台机器上的运行时间。该模型可进行调整,使得超过合理时间范围的估算概率任意低。其次,我们引入一种改进的单侧符号检验方法,该方法采用修正的p值并考虑上述概率,从而避免因算法A与B在不同机器上执行而增加第一类错误概率。