We study the problem of bounding the posterior distribution of discrete probabilistic programs with unbounded support, loops, and conditioning. Loops pose the main difficulty in this setting: even if exact Bayesian inference is possible, the state of the art requires user-provided loop invariant templates. By contrast, we aim to find guaranteed bounds, which sandwich the true distribution. They are fully automated, applicable to more programs and provide more provable guarantees than approximate sampling-based inference. Since lower bounds can be obtained by unrolling loops, the main challenge is upper bounds, and we attack it in two ways. The first is called residual mass semantics, which is a flat bound based on the residual probability mass of a loop. The approach is simple, efficient, and has provable guarantees. The main novelty of our work is the second approach, called geometric bound semantics. It operates on a novel family of distributions, called eventually geometric distributions (EGDs), and can bound the distribution of loops with a new form of loop invariants called contraction invariants. The invariant synthesis problem reduces to a system of polynomial inequality constraints, which is a decidable problem with automated solvers. If a solution exists, it yields an exponentially decreasing bound on the whole distribution, and can therefore bound moments and tail asymptotics as well, not just probabilities as in the first approach. Both semantics enjoy desirable theoretical properties. In particular, we prove soundness and convergence, i.e. the bounds converge to the exact posterior as loops are unrolled further. On the practical side, we describe Diabolo, a fully-automated implementation of both semantics, and evaluate them on a variety of benchmarks from the literature, demonstrating their general applicability and the utility of the resulting bounds.
翻译:我们研究了具有无界支持、循环和条件语句的离散概率程序后验分布的定界问题。循环是此场景中的主要难点:即使精确贝叶斯推断可行,现有技术仍需用户提供循环不变式模板。相比之下,我们的目标是寻找能夹逼真实分布的保证界。该方法完全自动化,适用于更多程序,且比基于近似采样的推断提供更多可证明的保证。由于下界可通过循环展开获得,主要挑战在于上界,我们通过两种方式解决该问题。第一种称为剩余质量语义,这是一种基于循环剩余概率质量的平坦界。该方法简单高效且具有可证明的保证。我们工作的主要创新在于第二种方法——几何界语义。它作用于新型分布族(称为最终几何分布),并可通过新型循环不变式(称为收缩不变式)对循环分布进行定界。不变式综合问题可归约为多项式不等式约束系统,这是可通过自动化求解器判定的可解问题。若解存在,则可获得在整个分布上指数衰减的界,因此不仅能界定概率(如第一种方法),还能界定矩和尾部渐近性质。两种语义均具备理想的理论性质。特别地,我们证明了可靠性与收敛性,即随着循环进一步展开,边界将收敛至精确后验分布。在实践层面,我们描述了完全自动化实现两种语义的系统Diabolo,并在文献中的各类基准测试上进行评估,证明了其普适性与所得边界的实用性。