Likelihood-Free Frequentist Inference: Bridging Classical Statistics and Machine Learning in Simulator-Based Inference

Many areas of science make extensive use of computer simulators that implicitly encode intractable likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, especially outside asymptotic and low-dimensional regimes. At the same time, traditional LFI methods - such as Approximate Bayesian Computation or more recent machine learning techniques - do not guarantee confidence sets with nominal coverage in general settings (i.e., with high-dimensional data, finite sample sizes, and regardless of the true parameter value). In addition, there are no practical diagnostic tools to check the empirical coverage of confidence sets provided by such methods across the entire parameter space. In this work, we propose a novel framework that bridges classical statistics and modern machine learning into (i) a practical, modular and efficient approach to the Neyman construction of confidence sets with frequentist finite-sample coverage for any value of the unknown parameters; and (ii) an interpretable diagnostic tool that estimates the empirical coverage across the entire parameter space. We refer to the general framework as likelihood-free frequentist inference (LF2I). Any method that defines a test statistic can leverage LF2I to create valid confidence sets and diagnostics without costly Monte Carlo samples at fixed parameter settings. We study the power of two likelihood-based test statistics (ACORE and BFF) and validate their empirical performance on several experimental settings.

翻译：许多科学领域广泛使用计算机模拟器，这些模拟器隐含地编码了复杂系统中难以处理的似然函数。经典统计方法在所谓的无似然推断（LFI）场景中难以适用，尤其是在渐近性和低维框架之外。同时，传统的LFI方法（如近似贝叶斯计算或更近期的机器学习技术）无法保证在一般场景下（即高维数据、有限样本量且无论真实参数值如何）提供具有标称覆盖率的置信集。此外，目前缺乏实用诊断工具来检验此类方法在整个参数空间上提供的置信集的经验覆盖率。在本工作中，我们提出了一种新颖框架，将经典统计与机器学习相结合，实现了：(i) 一种实用、模块化且高效的Neyman置信集构建方法，在频率学派有限样本框架下对未知参数的任意值均提供覆盖保证；(ii) 一种可解释的诊断工具，可估计整个参数空间上的经验覆盖率。我们将该通用框架称为无似然频率推断（LF2I）。任何定义检验统计量的方法均可利用LF2I构建有效置信集并提供诊断，无需在固定参数设置下进行昂贵的蒙特卡洛采样。我们研究了两种基于似然的检验统计量（ACORE和BFF）的效能，并在多个实验设置中验证了它们的经验性能。