Rapid progress in aerodynamic shape optimization (ASO) has outpaced currently-available standardized evaluation frameworks. Fair comparison requires a unified benchmark spanning diverse shape classes, objective formulations, and matched-budget state-of-the-art baselines. We introduce ShapeBench, an open-source ASO benchmark with a unified API spanning 103 tasks across eight shape categories and multiple optimization regimes. Each ShapeBench task includes a validated surrogate for fast search; when feasible, a high-fidelity Computational Fluid Dynamics (CFD) pipeline for final verification is available, enabling systematic fidelity-gap analysis. ShapeBench provides a reproducible protocol with well-configured baselines to compare fairly using a consistent budget metric, allowing for comparison among both classical and LLM-driven methods, including general-purpose optimizers and a new domain-specialized evolutionary LLM baseline, ShapeEvolve. Results on ShapeBench demonstrate substantial variance in optimizer rankings across shape categories and problem formulations, with mean pairwise Spearman $ρ= 0.013$, so single-task conclusions do not reliably generalize across problem classes. The benchmark is also far from saturation; classical methods are rarely applicable across all shape categories and tasks, further highlighting the need for more general-purpose approaches.
翻译:气动外形优化(ASO)领域的快速发展已超越当前可用的标准化评估框架。公平比较需要涵盖多种外形类别、目标函数公式及预算匹配的最新基线的统一基准。我们提出ShapeBench——一个开源的ASO基准测试,其统一API涵盖八个外形类别和多种优化机制下的103个任务。每个ShapeBench任务包含一个经过验证的代理模型以实现快速搜索;在可行情况下,还提供高保真计算流体动力学(CFD)流水线进行最终验证,从而支持系统性的保真度差距分析。ShapeBench通过配置完善的基线提供可复现的协议,使用一致的预算指标进行公平比较,允许对比经典方法与LLM驱动方法,包括通用优化器及新型领域专用进化LLM基线ShapeEvolve。在ShapeBench上的结果表明,优化器排名在外形类别和问题公式间存在显著差异,平均成对斯皮尔曼相关系数$ρ=0.013$,因此单任务结论无法可靠地推广至不同问题类别。该基准测试远未达到饱和状态;经典方法很少能适用于所有外形类别和任务,进一步凸显了对更通用方法的需求。