When a statistical model $\{P_θ : θ\in Θ\}$ lacks analytically tractable likelihoods, parametric statistical inference based on data generated from an unknown underlying distribution $P$ can still be performed as long as simulations from the model are possible. This approach is called Simulation Based Inference (SBI). Statistical models are rarely exactly correct (that is, $P \notin \{P_θ: θ\in Θ\}$), and Robust SBI focuses on inferring a reasonable parameter even under model mis-specification. We focus on the setting where $P$ possesses potentially both geometric and Total Variation type discrepancies from $P_{θ^*}$. For this problem, we use a Kullback-Liebler informed robust Optimal Transport divergence, motivated by Empirical Likelihood considerations. We introduce a stochastic sub-gradient ascent algorithm with a convergence guarantee for estimating the semi-discrete version of this robust Optimal Transport divergence, and design a parallelized SBI algorithm which employs the regular bootstrap on top of minimum semi-discrete robust Optimal Transport for parameter uncertainty quantification. We demonstrate mathematically why the divergence is robust under a joint geometric plus Total Variation type contamination and then illustrate the robustness of inferences on a complex benchmark SBI task.
翻译:当统计模型 $\{P_θ : θ\in Θ\}$ 缺乏解析形式的似然函数时,只要模型能够进行模拟,基于由未知分布 $P$ 生成的数据仍可实现参数统计推断。此类方法称为模拟推断(SBI)。统计模型通常并非完全正确(即 $P \notin \{P_θ: θ\in Θ\}$),鲁棒SBI旨在即使模型设定错误也能推断出合理参数。我们考虑$P$与$P_{θ^*}$之间同时存在几何偏差和全变差型偏差的情形。针对该问题,受经验似然思想启发,我们采用基于Kullback-Leibler信息的鲁棒最优传输散度。我们提出一种具有收敛保证的随机次梯度上升算法来估计该鲁棒最优传输散度的半离散版本,并设计了一种并行化SBI算法,该算法在最小半离散鲁棒最优传输基础上结合正则自举法进行参数不确定性量化。我们从数学上证明了该散度在联合几何加全变差型污染下的鲁棒性,并在复杂基准SBI任务上展示了推断的稳健性。