Calibrating simulation models that take large quantities of multi-dimensional data as input is a hard simulation optimization problem. Existing adaptive sampling strategies offer a methodological solution. However, they may not sufficiently reduce the computational cost for estimation and solution algorithm's progress within a limited budget due to extreme noise levels and heteroskedasticity of system responses. We propose integrating stratification with adaptive sampling for the purpose of efficiency in optimization. Stratification can exploit local dependence in the simulation inputs and outputs. Yet, the state-of-the-art does not provide a full capability to adaptively stratify the data as different solution alternatives are evaluated. We devise two procedures for data-driven calibration problems that involve a large dataset with multiple covariates to calibrate models within a fixed overall simulation budget. The first approach dynamically stratifies the input data using binary trees, while the second approach uses closed-form solutions based on linearity assumptions between the objective function and concomitant variables. We find that dynamical adjustment of stratification structure accelerates optimization and reduces run-to-run variability in generated solutions. Our case study for calibrating a wind power simulation model, widely used in the wind industry, using the proposed stratified adaptive sampling, shows better-calibrated parameters under a limited budget.
翻译:对以海量多维数据为输入的仿真模型进行校准是一个困难的仿真优化问题。现有的自适应采样策略提供了方法论层面的解决方案。然而,由于系统响应存在极端噪声水平与异方差性,这些策略在有限计算预算下可能无法充分降低估计与求解算法推进所需的计算成本。为提高优化效率,本文提出将分层技术与自适应采样相结合。分层技术能够利用仿真输入与输出之间的局部相关性。然而,现有先进方法未能提供在评估不同解决方案时对数据进行自适应分层处理的完整能力。针对涉及多协变量大型数据集的数据驱动校准问题,我们设计了两种可在固定总体仿真预算内完成模型校准的流程。第一种方法使用二叉树对输入数据进行动态分层,第二种方法则基于目标函数与伴随变量之间的线性关系假设,采用闭式解进行分层。研究发现,分层结构的动态调整能够加速优化过程并降低所生成解的运行间变异性。通过在风电行业广泛使用的风电功率仿真模型校准案例中应用所提出的分层自适应采样方法,结果表明在有限预算下能够获得更优的校准参数。