In the realm of statistical learning, the increasing volume of accessible data and increasing model complexity necessitate robust methodologies. This paper explores two branches of robust Bayesian methods in response to this trend. The first is generalized Bayesian inference, which introduces a learning rate parameter to enhance robustness against model misspecifications. The second is Gibbs posterior inference, which formulates inferential problems using generic loss functions rather than probabilistic models. In such approaches, it is necessary to calibrate the spread of the posterior distribution by selecting a learning rate parameter. The study aims to enhance the generalized posterior calibration (GPC) algorithm proposed by Syring and Martin (2019) [Biometrika, Volume 106, Issue 2, pp. 479-486]. Their algorithm chooses the learning rate to achieve the nominal frequentist coverage probability, but it is computationally intensive because it requires repeated posterior simulations for bootstrap samples. We propose a more efficient version of the GPC inspired by sequential Monte Carlo (SMC) samplers. A target distribution with a different learning rate is evaluated without posterior simulation as in the reweighting step in SMC sampling. Thus, the proposed algorithm can reach the desired value within a few iterations. This improvement substantially reduces the computational cost of the GPC. Its efficacy is demonstrated through synthetic and real data applications.
翻译:在统计学习领域,日益增长的可获取数据量与模型复杂度的提升对方法的鲁棒性提出了更高要求。本文针对这一趋势,探索了两种鲁棒贝叶斯方法分支:第一种是广义贝叶斯推断,通过引入学习率参数增强对模型误设的鲁棒性;第二种是吉布斯后验推断,利用通用损失函数而非概率模型来构建推断问题。在这类方法中,需要通过选择学习率参数来校准后验分布的离散程度。本研究旨在改进Syring与Martin(2019)提出的广义后验校准(GPC)算法 [Biometrika, Volume 106, Issue 2, pp. 479-486]。该算法通过选择学习率以实现名义频率覆盖概率,但由于需要对自助法样本反复进行后验模拟,其计算成本极高。受序贯蒙特卡洛(SMC)采样器启发,我们提出了一种更高效的GPC算法。如SMC采样中的重加权步骤所示,无需进行后验模拟即可评估具有不同学习率的目标分布。因此,所提算法可在数次迭代内达到目标值。这一改进显著降低了GPC的计算成本。通过合成数据与真实数据应用验证了该方法的有效性。