In large-scale benchmarking of stochastic optimization algorithms, the key challenge is no longer whether repeated runs are needed for reliability, but how to determine when sufficient evidence has been collected without incurring unnecessary computational cost. We study a learning-based extension of a recent empirical online heuristic that adaptively estimates the required number of runs using outlier handling and skewness-based symmetry checks. Using annotated outcomes from 132{,}000 Nevergrad runs on COCO (24 problems in 20 dimensions, 10 instances each, 11 optimizers), we train classifiers on 23 statistical, energy-free, and shape and stability features to predict whether a run-number estimate is reliable, prioritizing detection of incorrect estimates via minority-class recall. We evaluate reliability prediction using a within-configuration learning setup, where models are trained and tested on data sharing the same optimizer. The results show that run-number reliability can be learned in a within-configuration scenario, enabling detection of unreliable estimates with high minority-class recall, although performance remains limited by the restricted data diversity within fixed configurations.
翻译:在大规模随机优化算法基准测试中,关键挑战不再是如何通过重复运行确保可靠性,而是如何在不产生不必要计算成本的前提下确定何时已收集到充分证据。我们研究了一种基于学习的扩展方法,该方法基于近期提出的经验性在线启发式算法,通过异常值处理和基于偏度的对称性检查自适应地估计所需运行次数。利用来自COCO平台(20维空间中的24个问题,每个问题10个实例,11个优化器)上132,000次Nevergrad运行的带标注结果,我们在23个统计特征、无能量特征以及形状与稳定性特征上训练分类器,以预测运行次数估计是否可靠,并通过少数类召回率优先检测错误估计。我们采用配置内学习设置评估可靠性预测,即模型在与同一优化器共享的数据上训练和测试。结果表明,在配置内场景下可学习运行次数可靠性,从而实现以高少数类召回率检测不可靠估计,尽管在固定配置内受限的数据多样性仍制约了性能表现。