Bayesian modeling provides a principled approach to quantifying uncertainty in model parameters and model structure and has seen a surge of applications in recent years. Within the context of a Bayesian workflow, we are concerned with model selection for the purpose of finding models that best explain the data, that is, help us understand the underlying data generating process. Since we rarely have access to the true process, all we are left with during real-world analyses is incomplete causal knowledge from sources outside of the current data and model predictions of said data. This leads to the important question of when the use of prediction as a proxy for explanation for the purpose of model selection is valid. We approach this question by means of large-scale simulations of Bayesian generalized linear models where we investigate various causal and statistical misspecifications. Our results indicate that the use of prediction as proxy for explanation is valid and safe only when the models under consideration are sufficiently consistent with the underlying causal structure of the true data generating process.
翻译:贝叶斯建模为量化模型参数与结构的不确定性提供了理论化方法,近年来在应用领域呈现爆发式增长。在贝叶斯工作流的框架下,我们关注旨在寻找最能解释数据的模型(即帮助我们理解底层数据生成过程)的模型选择问题。由于真实过程几乎无法获取,实际分析中我们只能依赖当前数据外部的非完整因果知识及该数据的模型预测。这引出一个关键问题:何时将预测作为模型选择的解释代理是有效的?我们通过大规模贝叶斯广义线性模型模拟实验,系统研究了各种因果与统计设定偏差。结果表明:仅当所考察模型与真实数据生成过程的底层因果结构保持充分一致时,将预测作为解释代理才是有效且安全的。