Probabilistic programming languages rely fundamentally on some notion of sampling, and this is doubly true for probabilistic programming languages which perform Bayesian inference using Monte Carlo techniques. Verifying samplers - proving that they generate samples from the correct distribution - is crucial to the use of probabilistic programming languages for statistical modelling and inference. However, the typical denotational semantics of probabilistic programs is incompatible with deterministic notions of sampling. This is problematic, considering that most statistical inference is performed using pseudorandom number generators. We present a higher-order probabilistic programming language centred on the notion of samplers and sampler operations. We give this language an operational and denotational semantics in terms of continuous maps between topological spaces. Our language also supports discontinuous operations, such as comparisons between reals, by using the type system to track discontinuities. This feature might be of independent interest, for example in the context of differentiable programming. Using this language, we develop tools for the formal verification of sampler correctness. We present an equational calculus to reason about equivalence of samplers, and a sound calculus to prove semantic correctness of samplers, i.e. that a sampler correctly targets a given measure by construction.
翻译:概率编程语言本质上依赖于某种采样概念,而对于使用蒙特卡洛技术进行贝叶斯推断的概率编程语言而言,这种依赖性尤为突出。验证采样器——即证明它们能从正确的分布中生成样本——对于将概率编程语言用于统计建模和推断至关重要。然而,概率程序的典型指称语义与确定性的采样概念并不兼容。考虑到大多数统计推断是利用伪随机数生成器进行的,这便构成了一个问题。我们提出了一种以采样器和采样器操作为核心的高阶概率编程语言。我们为这种语言赋予了基于拓扑空间之间连续映射的操作语义和指称语义。我们的语言还通过使用类型系统来追踪不连续性,从而支持诸如实数比较之类的不连续操作。这一特性可能具有独立的研究价值,例如在可微编程的背景下。利用这种语言,我们开发了用于形式化验证采样器正确性的工具。我们提出了一套用于推理采样器等价性的等式演算,以及一套用于证明采样器语义正确性的可靠演算,即证明一个采样器能够根据构造正确地针对给定的测度进行采样。