This paper, which is Part 1 of a two-part paper series, considers a simulation-based inference with learned summary statistics, in which such a learned summary statistic serves as an empirical-likelihood with ameliorative effects in the Bayesian setting, when the exact likelihood function associated with the observation data and the simulation model is difficult to obtain in a closed form or computationally intractable. In particular, a transformation technique which leverages the Cressie-Read discrepancy criterion under moment restrictions is used for summarizing the learned statistics between the observation data and the simulation outputs, while preserving the statistical power of the inference. Here, such a transformation of data-to-learned summary statistics also allows the simulation outputs to be conditioned on the observation data, so that the inference task can be performed over certain sample sets of the observation data that are considered as an empirical relevance or believed to be particular importance. Moreover, the simulation-based inference framework discussed in this paper can be extended further, and thus handling weakly dependent observation data. Finally, we remark that such an inference framework is suitable for implementation in distributed computing, i.e., computational tasks involving both the data-to-learned summary statistics and the Bayesian inferencing problem can be posed as a unified distributed inference problem that will exploit distributed optimization and MCMC algorithms for supporting large datasets associated with complex simulation models.
翻译:本文作为两篇系列论文的第一部分,研究一种基于学习摘要统计量的模拟推断方法。当观测数据与仿真模型对应的精确似然函数难以以闭式形式获得或计算不可行时,此类学习得到的摘要统计量可在贝叶斯框架中作为具有改进效应的经验似然函数使用。特别地,本文采用一种基于矩约束下Cressie-Read差异准则的变换技术,用于概括观测数据与仿真输出之间的学习统计量,同时保持推断的统计效能。这种数据到学习摘要统计量的变换还允许将仿真输出条件化于观测数据,从而使推断任务能够在某些被视为具有经验相关性或被认定具有特殊重要性的观测数据样本集上进行。此外,本文讨论的基于模拟的推断框架可进一步扩展以处理弱相关观测数据。最后需要指出,该推断框架适用于分布式计算实现——即涉及数据到学习摘要统计量的计算任务与贝叶斯推断问题,可共同构造成统一的分布式推断问题,从而利用分布式优化与MCMC算法支持复杂仿真模型对应的大规模数据集处理。