When a statistical model $\{P_θ : θ\in Θ\}$ lacks analytically tractable likelihoods, parametric statistical inference based on data generated from an unknown underlying distribution $P$ can still be performed as long as simulations from the model are possible. This approach is called Simulation Based Inference (SBI). Statistical models are rarely exactly correct (that is, $P \notin \{P_θ: θ\in Θ\}$), and Robust SBI focuses on inferring a reasonable parameter even under model mis-specification. We focus on the setting where $P$ possesses potentially both geometric and Total Variation type discrepancies from $P_{θ^*}$. For this problem, we use a Kullback-Liebler informed robust Optimal Transport divergence, motivated by Empirical Likelihood considerations. We introduce a stochastic sub-gradient ascent algorithm with a convergence guarantee for estimating the semi-discrete version of this robust Optimal Transport divergence, and design a parallelized SBI algorithm which employs the regular bootstrap on top of minimum semi-discrete robust Optimal Transport for parameter uncertainty quantification. We demonstrate mathematically why the divergence is robust under a joint geometric plus Total Variation type contamination and then illustrate the robustness of inferences on a complex benchmark SBI task.
翻译:当统计模型 $\{P_θ : θ\in Θ\}$ 缺乏解析可处理的似然函数时,只要能够从模型中进行模拟,仍可基于未知底层分布 $P$ 生成的数据进行参数统计推断。这种方法称为仿真推断(SBI)。统计模型很少完全正确(即 $P \notin \{P_θ: θ\in Θ\}$),鲁棒SBI专注于即使在模型误设下也能推断出合理的参数。我们关注 $P$ 在几何和全变差类型上均可能与 $P_{θ^*}$ 存在偏差的场景。针对此问题,我们利用基于经验似然思想的Kullback-Liebler信息鲁棒最优传输散度。我们提出一种具有收敛保证的随机次梯度上升算法,用于估计该鲁棒最优传输散度的半离散版本,并设计了一种并行化SBI算法,该算法在最小半离散鲁棒最优传输基础上结合正则自助法进行参数不确定性量化。我们从数学上论证了该散度在联合几何与全变差类型污染下的鲁棒性,并通过复杂基准SBI任务展示了推理的鲁棒性。