Accurate and efficient surrogate modeling is essential for modern computational science, and there are a staggering number of emulation methods to choose from. With new methods being developed all the time, comparing the relative strengths and weaknesses of different methods remains a challenge due to inconsistent benchmarking practices and (sometimes) limited reproducibility and transparency. In this work, we present a large-scale, fully reproducible comparison of $29$ distinct emulators across $60$ canonical test functions and $40$ real emulation datasets. To facilitate rigorous, apples-to-apples comparisons, we introduce the R package \texttt{duqling}, which streamlines reproducible simulation studies using a consistent, simple syntax, and automatic internal scaling of inputs. This framework allows researchers to compare emulators in a unified environment and makes it possible to replicate or extend previous studies with minimal effort, even across different publications. Our results provide detailed empirical insight into the strengths and weaknesses of state-of-the-art emulators and offer guidance for both method developers and practitioners selecting a surrogate for new data. We discuss best practices for emulator comparison and highlight how \texttt{duqling} can accelerate research in emulator design and application.
翻译:精确高效的替代建模对现代计算科学至关重要,而可供选择的模拟方法数量惊人。随着新方法的不断涌现,由于不统一的基准测试实践以及(有时)有限的可重复性和透明度,比较不同方法的相对优劣仍是一项挑战。在本研究中,我们进行了大规模、完全可重复的比较,涵盖29种不同的模拟器,涉及60个标准测试函数和40个真实模拟数据集。为促进严格的同类比较,我们引入了R包\texttt{duqling},它通过一致的简单语法和输入自动内部缩放,简化了可重复的模拟研究。该框架使研究人员能在统一环境中比较模拟器,并可轻松复制或扩展先前研究,即使源自不同出版物。我们的结果为最先进模拟器的优劣提供了详细的经验性见解,并为方法开发者和为新数据选择替代模型的实践者提供了指导。我们讨论了模拟器比较的最佳实践,并强调了\texttt{duqling}如何加速模拟器设计与应用领域的研究。