Approximate Bayesian Computation with Statistical Distances for Model Selection

Model selection is a key task in statistics, playing a critical role across various scientific disciplines. While no model can fully capture the complexities of a real-world data-generating process, identifying the model that best approximates it can provide valuable insights. Bayesian statistics offers a flexible framework for model selection by updating prior beliefs as new data becomes available, allowing for ongoing refinement of candidate models. This is typically achieved by calculating posterior probabilities, which quantify the support for each model given the observed data. However, in cases where likelihood functions are intractable, exact computation of these posterior probabilities becomes infeasible. Approximate Bayesian Computation (ABC) has emerged as a likelihood-free method and it is traditionally used with summary statistics to reduce data dimensionality, however this often results in information loss difficult to quantify, particularly in model selection contexts. Recent advancements propose the use of full data approaches based on statistical distances, offering a promising alternative that bypasses the need for summary statistics and potentially allows recovery of the exact posterior distribution. Despite these developments, full data ABC approaches have not yet been widely applied to model selection problems. This paper seeks to address this gap by investigating the performance of ABC with statistical distances in model selection. Through simulation studies and an application to toad movement models, this work explores whether full data approaches can overcome the limitations of summary statistic-based ABC for model choice.

翻译：模型选择是统计学中的核心任务，在众多科学领域发挥着关键作用。尽管任何模型都无法完全捕捉现实世界数据生成过程的复杂性，但识别最能近似该过程的模型仍能提供重要洞见。贝叶斯统计学通过在新数据可用时更新先验信念，为模型选择提供了灵活框架，允许对候选模型进行持续优化。这一过程通常通过计算后验概率实现，该概率量化了给定观测数据下对各模型的支持程度。然而，当似然函数难以处理时，精确计算这些后验概率变得不可行。近似贝叶斯计算（ABC）作为一种免似然方法应运而生，传统上通过摘要统计量降低数据维度，但这往往导致难以量化的信息损失，在模型选择场景中尤为明显。最新进展提出了基于统计距离的完整数据方法，为绕过摘要统计量需求提供了有前景的替代方案，并可能恢复精确后验分布。尽管取得这些进展，完整数据ABC方法尚未在模型选择问题中得到广泛应用。本文旨在通过研究基于统计距离的ABC在模型选择中的表现来填补这一空白。通过模拟研究和对蟾蜍运动模型的应用，本文探讨了完整数据方法能否克服基于摘要统计量的ABC在模型选择中的局限性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/