In predictive modeling with simulation or machine learning, it is critical to accurately assess the quality of estimated values through output analysis. In recent decades output analysis has become enriched with methods that quantify the impact of input data uncertainty in the model outputs to increase robustness. However, most developments are applicable assuming that the input data adheres to a parametric family of distributions. We propose a unified output analysis framework for simulation and machine learning outputs through the lens of Monte Carlo sampling. This framework provides nonparametric quantification of the variance and bias induced in the outputs with higher-order accuracy. Our new bias-corrected estimation from the model outputs leverages the extension of fast iterative bootstrap sampling and higher-order influence functions. For the scalability of the proposed estimation methods, we devise budget-optimal rules and leverage control variates for variance reduction. Our theoretical and numerical results demonstrate a clear advantage in building more robust confidence intervals from the model outputs with higher coverage probability.
翻译:在基于仿真或机器学习的预测建模中,通过输出分析准确评估估计值的质量至关重要。近几十年来,输出分析方法日益丰富,通过量化输入数据不确定性对模型输出的影响来增强鲁棒性。然而,大多数方法假设输入数据服从参数分布族。我们提出了一种统一的输出分析框架,通过蒙特卡罗采样的视角处理仿真与机器学习输出。该框架以高阶精度对输出中产生的方差和偏差提供非参数化量化。我们基于模型输出的新型偏差校正估计方法,利用了快速迭代自举采样与高阶影响函数的扩展。针对所提估计方法的可扩展性,我们设计了预算最优规则并采用控制变量法进行方差缩减。理论与数值结果表明,该方法在从模型输出构建具有更高覆盖概率的鲁棒置信区间方面具有显著优势。