We study the stability of posterior predictive inferences to the specification of the likelihood model and perturbations of the data generating process. In modern big data analyses, useful broad structural judgements may be elicited from the decision-maker but a level of interpolation is required to arrive at a likelihood model. As a result, an often computationally convenient canonical form is used in place of the decision-maker's true beliefs. Equally, in practice, observational datasets often contain unforeseen heterogeneities and recording errors and therefore do not necessarily correspond to how the process was idealised by the decision-maker. Acknowledging such imprecisions, a faithful Bayesian analysis should ideally be stable across reasonable equivalence classes of such inputs. We are able to guarantee that traditional Bayesian updating provides stability across only a very strict class of likelihood models and data generating processes, requiring the decision-maker to elicit their beliefs and understand how the data was generated with an unreasonable degree of accuracy. On the other hand, a generalised Bayesian alternative using the $\beta$-divergence loss function is shown to be stable across practical and interpretable neighbourhoods, providing assurances that posterior inferences are not overly dependent on accidentally introduced spurious specifications or data collection errors. We illustrate this in linear regression, binary classification, and mixture modelling examples, showing that stable updating does not compromise the ability to learn about the data generating process. These stability results provide a compelling justification for using generalised Bayes to facilitate inference under simplified canonical models.
翻译:我们研究了后验预测推断关于似然模型设定和数据生成过程扰动的稳定性。在现代大数据分析中,虽然可以从决策者处获取有用的广泛结构性判断,但仍需通过一定程度的插值才能得到似然模型。因此,实践中常采用计算上便捷的规范形式来替代决策者的真实信念。同样,在实际观测数据中往往存在未预见的异质性和记录误差,因此这些数据未必与决策者对过程的理想化设定相一致。承认这些不精确性后,一个可靠的贝叶斯分析理应在此类输入的合理等价类间保持稳定性。我们发现传统贝叶斯更新仅能在极为严格的似然模型与数据生成过程类别中保证这种稳定性,这要求决策者以不切实际的精确度来提取其信念并理解数据生成方式。另一方面,基于$\beta$散度损失函数的广义贝叶斯方法被证明在实际可解释的邻域内具有稳定性,这确保了后验推断不会过度依赖偶然引入的虚假设定或数据采集误差。我们通过线性回归、二元分类和混合模型示例证明,这种稳定更新并未损害对数据生成过程的学习能力。这些稳定性结果为在简化规范模型下使用广义贝叶斯进行推断提供了有力佐证。