Randomness as Reference: Benchmark Metric for Optimization in Engineering

Benchmarking optimization algorithms is fundamental for the advancement of computational intelligence. However, widely adopted artificial test suites exhibit limited correspondence with the diversity and complexity of real-world engineering optimization tasks. This paper presents a new benchmark suite comprising 235 bounded, continuous, unconstrained optimization problems, the majority derived from engineering design and simulation scenarios, including computational fluid dynamics and finite element analysis models. In conjunction with this suite, a novel performance metric is introduced, which employs random sampling as a statistical reference, providing nonlinear normalization of objective values and enabling unbiased comparison of algorithmic efficiency across heterogeneous problems. Using this framework, 20 deterministic and stochastic optimization methods were systematically evaluated through hundreds of independent runs per problem, ensuring statistical robustness. The results indicate that only a few of the tested optimization methods consistently achieve excellent performance, while several commonly used metaheuristics exhibit severe efficiency loss on engineering-type problems, emphasizing the limitations of conventional benchmarks. Furthermore, the conducted tests are used for analyzing various features of the optimization methods, providing practical guidelines for their application. The proposed test suite and metric together offer a transparent, reproducible, and practically relevant platform for evaluating and comparing optimization methods, thereby narrowing the gap between the available benchmark tests and realistic engineering applications.

翻译：基准测试优化算法是推动计算智能发展的基石。然而，当前广泛采用的人工测试函数集与真实工程优化问题的多样性与复杂性之间存在明显偏差。本文提出一个包含235个有界连续无约束优化问题的新型基准测试集，其中大多数问题源自工程设计及仿真场景（涵盖计算流体动力学与有限元分析模型）。配合该测试集，本文引入了一种基于随机采样统计参照的新型性能度量方法，该方法通过对目标函数值进行非线性归一化处理，实现了异构问题间算法效率的无偏比较。基于这一框架，本文对20种确定性与随机性优化方法进行了系统评估——针对每个问题执行数百次独立重复实验以确保统计稳健性。结果表明，仅少数优化方法能持续实现卓越性能，而若干常用元启发式算法在工程类问题上暴露出严重的效率损失，凸显了传统基准测试的局限性。此外，通过实验测试对优化方法的多项特征进行深入分析，为方法应用提供了实践指导。本文提出的测试集与度量体系共同构建了一个透明、可复现且贴近工程实践的优化方法评估比较平台，有效弥合了现有基准测试与真实工程应用之间的鸿沟。