Many areas of science make extensive use of computer simulators that implicitly encode likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, particularly outside asymptotic and low-dimensional regimes. Although new machine learning methods, such as normalizing flows, have revolutionized the sample efficiency and capacity of LFI methods, it remains an open question whether they produce confidence sets with correct conditional coverage for small sample sizes. This paper unifies classical statistics with modern machine learning to present (i) a practical procedure for the Neyman construction of confidence sets with finite-sample guarantees of nominal coverage, and (ii) diagnostics that estimate conditional coverage over the entire parameter space. We refer to our framework as likelihood-free frequentist inference (LF2I). Any method that defines a test statistic, like the likelihood ratio, can leverage the LF2I machinery to create valid confidence sets and diagnostics without costly Monte Carlo samples at fixed parameter settings. We study the power of two test statistics (ACORE and BFF), which, respectively, maximize versus integrate an odds function over the parameter space. Our paper discusses the benefits and challenges of LF2I, with a breakdown of the sources of errors in LF2I confidence sets.
翻译:科学领域的许多研究广泛使用计算机模拟器,这些模拟器隐式编码了复杂系统的似然函数。经典统计方法难以适用于这些所谓无似然推断(LFI)场景,尤其是在渐近和高维范围之外的情形。尽管近年来诸如归一化流等新型机器学习方法极大提升了LFI方法的样本效率和容量,但其在小样本量下能否生成具有正确条件覆盖率的置信集仍是悬而未决的问题。本文通过统一经典统计学与现代机器学习,提出:(i)一种用于内曼构造置信集的实用流程,可保证有限样本下的名义覆盖率;(ii)能够估计整个参数空间上条件覆盖率的诊断方法。我们将该框架称为无似然频率推断(LF2I)。任何定义检验统计量的方法(如似然比)均可借助LF2I机制构建有效置信集与诊断,而无需在固定参数设置下进行昂贵的蒙特卡洛采样。我们研究了两种检验统计量(ACORE和BFF)的功效,它们分别通过最大化与积分参数空间上的比值函数来工作。本文讨论了LF2I的优势与挑战,并详细解析了LF2I置信集中误差的来源。