Likelihood-Free Frequentist Inference: Bridging Classical Statistics and Machine Learning for Reliable Simulator-Based Inference

from arxiv, 45 pages, 6 figures, code available at https://github.com/lee-group-cmu/lf2i, supplementary material available at https://lucamasserano.github.io/data/LF2I_supplementary_material.pdf

Many areas of science make extensive use of computer simulators that implicitly encode intractable likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, especially outside asymptotic and low-dimensional regimes. At the same time, traditional LFI methods - such as Approximate Bayesian Computation or more recent machine learning techniques - do not guarantee confidence sets with nominal coverage in general settings (i.e., with high-dimensional data, finite sample sizes, and for any parameter value). In addition, there are no diagnostic tools to check the empirical coverage of confidence sets provided by such methods across the entire parameter space. In this work, we propose a unified and modular inference framework that bridges classical statistics and modern machine learning providing (i) a practical approach to the Neyman construction of confidence sets with frequentist finite-sample coverage for any value of the unknown parameters; and (ii) interpretable diagnostics that estimate the empirical coverage across the entire parameter space. We refer to the general framework as likelihood-free frequentist inference (LF2I). Any method that defines a test statistic can leverage LF2I to create valid confidence sets and diagnostics without costly Monte Carlo samples at fixed parameter settings. We study the power of two likelihood-based test statistics (ACORE and BFF) and demonstrate their empirical performance on high-dimensional, complex data. Code is available at https://github.com/lee-group-cmu/lf2i.

翻译：许多科学领域广泛使用计算机模拟器，这些模拟器隐式编码了复杂系统的难以处理的似然函数。经典统计方法难以适应这类所谓的无似然推断（LFI）场景，尤其是在渐近区域和低维区域之外。与此同时，传统LFI方法（例如近似贝叶斯计算或更新的机器学习技术）无法在一般场景（即高维数据、有限样本量及任意参数值）下保证置信集的标称覆盖。此外，也缺乏诊断工具来检查这些方法在整个参数空间上提供的置信集的实证覆盖率。在本工作中，我们提出一个统一且模块化的推断框架，连接经典统计与现代机器学习，提供：（i）一种针对奈曼构造置信集的实用方法，该方法在任意未知参数值下具有频率学有限样本覆盖；（ii）可解释的诊断工具，用于估算整个参数空间上的实证覆盖率。我们将该通用框架称为无似然频率推断（LF2I）。任何定义检验统计量的方法均可利用LF2I构建有效置信集与诊断工具，无需在固定参数设置下进行昂贵的蒙特卡洛采样。我们研究了两个基于似然的检验统计量（ACORE和BFF）的功效，并在高维复杂数据上展示了其实证性能。代码发布于https://github.com/lee-group-cmu/lf2i。