Uncertainty quantification, by means of confidence interval (CI) construction, has been a fundamental problem in statistics and also important in risk-aware decision-making. In this paper, we revisit the basic problem of CI construction, but in the setting of expensive black-box models. This means we are confined to using a low number of model runs, and without the ability to obtain auxiliary model information such as gradients. In this case, there exist classical methods based on data splitting, and newer methods based on suitable resampling. However, while all these resulting CIs have similarly accurate coverage in large sample, their efficiencies in terms of interval length differ, and a systematic understanding of which method and configuration attains the shortest interval appears open. Motivated by this, we create a theoretical framework to study the statistical optimality on CI tightness under computation constraint. Our theory shows that standard batching, but also carefully constructed new formulas using uneven-size or overlapping batches, batched jackknife, and the so-called cheap bootstrap and its weighted generalizations, are statistically optimal. Our developments build on a new bridge of the classical notion of uniformly most accurate unbiasedness with batching and resampling, by viewing model runs as asymptotically Gaussian "data", as well as a suitable notion of homogeneity for CIs.
翻译:不确定性量化,通过置信区间(CI)构建的方式,一直是统计学中的一个基本问题,在风险感知决策中也具有重要意义。本文重新审视了CI构建这一基本问题,但将其置于昂贵黑箱模型的背景下。这意味着我们被限制只能使用少量的模型运行次数,并且无法获取梯度等辅助模型信息。在这种情况下,存在基于数据拆分的经典方法,以及基于适当重采样的新方法。然而,尽管所有这些方法产生的CI在大样本下具有相似的准确覆盖率,但它们在区间长度方面的效率存在差异,并且对于哪种方法及其配置能够获得最短区间,似乎缺乏系统的理解。受此启发,我们建立了一个理论框架,以研究计算约束下CI紧致度的统计最优性。我们的理论表明,标准批处理,以及使用不等大小或重叠批次、批处理刀切法、所谓的廉价自助法及其加权推广等精心构建的新公式,都是统计最优的。我们的理论发展建立在经典概念——均匀最准确无偏性——与批处理和重采样之间的新桥梁之上,其方式是将模型运行视为渐近高斯的“数据”,并采用了一种适用于CI的齐次性概念。