Calibrating simulation models that take large quantities of multi-dimensional data as input is a hard simulation optimization problem. Existing adaptive sampling strategies offer a methodological solution. However, they may not sufficiently reduce the computational cost for estimation and solution algorithm's progress within a limited budget due to extreme noise levels and heteroskedasticity of system responses. We propose integrating stratification with adaptive sampling for the purpose of efficiency in optimization. Stratification can exploit local dependence in the simulation inputs and outputs. Yet, the state-of-the-art does not provide a full capability to adaptively stratify the data as different solution alternatives are evaluated. We devise two procedures for data-driven calibration problems that involve a large dataset with multiple covariates to calibrate models within a fixed overall simulation budget. The first approach dynamically stratifies the input data using binary trees, while the second approach uses closed-form solutions based on linearity assumptions between the objective function and concomitant variables. We find that dynamical adjustment of stratification structure accelerates optimization and reduces run-to-run variability in generated solutions. Our case study for calibrating a wind power simulation model, widely used in the wind industry, using the proposed stratified adaptive sampling, shows better-calibrated parameters under a limited budget.
翻译:以下述多维度大规模数据作为输入的仿真模型校准确是一个困难的仿真优化问题。现有自适应采样策略提供了方法论解决方案,但受限于极端噪声水平与系统响应的异方差性,这些方法在有限预算内可能无法充分降低估计与求解算法进展的计算成本。我们提出将分层与自适应采样相结合以提升优化效率,分层可有效利用仿真输入与输出中的局部依赖性,但现有技术尚不能在评估不同解方案时对数据进行自适应分层。针对涉及含多协变量的大规模数据集的基于数据驱动的校准问题,我们设计了两种在固定总体仿真预算内的校准流程:第一种方法利用二叉树对输入数据进行动态分层,第二种方法基于目标函数与伴随变量之间的线性假设采用闭式解。研究发现,分层结构的动态调整可加速优化进程并降低生成解的批次间变异性。应用所提出的分层自适应采样方法,对风电行业广泛使用的风力发电仿真模型进行校准的案例研究表明,在有限预算下该方法可获得更优的校准参数。