Model selection is a key task in statistics, playing a critical role across various scientific disciplines. While no model can fully capture the complexities of a real-world data-generating process, identifying the model that best approximates it can provide valuable insights. Bayesian statistics offers a flexible framework for model selection by updating prior beliefs as new data becomes available, allowing for ongoing refinement of candidate models. This is typically achieved by calculating posterior probabilities, which quantify the support for each model given the observed data. However, in cases where likelihood functions are intractable, exact computation of these posterior probabilities becomes infeasible. Approximate Bayesian Computation (ABC) has emerged as a likelihood-free method and it is traditionally used with summary statistics to reduce data dimensionality, however this often results in information loss difficult to quantify, particularly in model selection contexts. Recent advancements propose the use of full data approaches based on statistical distances, offering a promising alternative that bypasses the need for summary statistics and potentially allows recovery of the exact posterior distribution. Despite these developments, full data ABC approaches have not yet been widely applied to model selection problems. This paper seeks to address this gap by investigating the performance of ABC with statistical distances in model selection. Through simulation studies and an application to toad movement models, this work explores whether full data approaches can overcome the limitations of summary statistic-based ABC for model choice.
翻译:模型选择是统计学中的核心任务,在众多科学领域发挥着关键作用。尽管任何模型都无法完全捕捉现实世界数据生成过程的复杂性,但识别最能近似该过程的模型仍能提供重要洞见。贝叶斯统计学通过在新数据可用时更新先验信念,为模型选择提供了灵活框架,允许对候选模型进行持续优化。这一过程通常通过计算后验概率实现,该概率量化了给定观测数据下对各模型的支持程度。然而,当似然函数难以处理时,精确计算这些后验概率变得不可行。近似贝叶斯计算(ABC)作为一种免似然方法应运而生,传统上通过摘要统计量降低数据维度,但这往往导致难以量化的信息损失,在模型选择场景中尤为明显。最新进展提出了基于统计距离的完整数据方法,为绕过摘要统计量需求提供了有前景的替代方案,并可能恢复精确后验分布。尽管取得这些进展,完整数据ABC方法尚未在模型选择问题中得到广泛应用。本文旨在通过研究基于统计距离的ABC在模型选择中的表现来填补这一空白。通过模拟研究和对蟾蜍运动模型的应用,本文探讨了完整数据方法能否克服基于摘要统计量的ABC在模型选择中的局限性。