The generality and robustness of inference algorithms is critical to the success of widely used probabilistic programming languages such as Stan, PyMC, Pyro, and Turing.jl. When designing a new general-purpose inference algorithm, whether it involves Monte Carlo sampling or variational approximation, the fundamental problem arises in evaluating its accuracy and efficiency across a range of representative target models. To solve this problem, we propose posteriordb, a database of models and data sets defining target densities along with reference Monte Carlo draws. We further provide a guide to the best practices in using posteriordb for model evaluation and comparison. To provide a wide range of realistic target densities, posteriordb currently comprises 120 representative models and has been instrumental in developing several general inference algorithms.
翻译:推断算法的通用性与鲁棒性对于广泛使用的概率编程语言(如Stan、PyMC、Pyro和Turing.jl)的成功至关重要。在设计新的通用推断算法时,无论是涉及蒙特卡洛采样还是变分近似,其核心问题在于如何评估算法在各类代表性目标模型上的准确性与效率。为解决该问题,我们提出posteriordb——一个包含模型与数据集的数据库,其中明确定义了目标概率密度函数并提供了参考蒙特卡洛采样样本。我们进一步提供了使用posteriordb进行模型评估与比较的最佳实践指南。为提供多样化的现实目标概率密度函数,posteriordb目前收录了120个代表性模型,并已成功应用于多个通用推断算法的开发过程。