Incremental Computation for Efficient Programmable Inference in Probabilistic Programs

Inference in probabilistic programs generally requires evaluating many possible program executions to find those of high posterior density. To scale inference to large datasets, it is crucial that expensive intermediate results are shared across these many evaluations, rather than recomputed from scratch. This paper presents a new approach to realizing this sharing, based on \textit{incremental computation}, a technique for efficiently recomputing (deterministic) program outputs when program inputs change. First, we show how expressive probabilistic programs can be compiled to deterministic ones that compute their density functions. Then, building on the incremental $λ$-calculus, we develop a general technique for compositionally incrementalizing expressive functional programs, and apply it to these densities. The resulting incremental densities can be used to accelerate a broad range of Monte Carlo inference algorithms, including for nonparametric models not well supported by existing systems. Furthermore, our decomposition of incremental density computation into separate density and incrementalization steps allows for modular reasoning about correctness -- a key pain point in existing systems, where ad-hoc incrementalization features are a known source of soundness bugs. We develop denotational logical relations arguments for the correctness of each step independently, and implement the approach in a Julia prototype, finding that it leads to asymptotic runtime improvements in the size of the dataset on a range of models and inference algorithms.

翻译：概率程序的推理通常需要评估大量可能的程序执行路径，以寻找后验密度较高的执行路径。为了将推理扩展到大规模数据集，关键在于这些多次评估中应当共享昂贵的中间结果，而非从头重新计算。本文提出了一种实现这种共享的新方法，其基础是增量计算技术——一种在程序输入变化时高效地重新计算（确定性）程序输出的技术。首先，我们展示了如何将表达能力强的概率程序编译为可计算其密度函数的确定性程序。随后，基于增量λ演算，我们发展了一种通用技术，用于组合式地增量计算表达能力强的函数式程序，并将其应用于这些密度函数。由此产生的增量密度函数可用于加速广泛的蒙特卡洛推理算法，包括现有系统支持不佳的非参数模型。此外，我们将增量密度计算分解为独立的密度计算和增量计算步骤，这使得我们可以对正确性进行模块化推理——这是现有系统中的一个关键痛点，其中特设的增量计算特性是已知的可靠性错误来源。我们分别为每个步骤开发了基于指称逻辑关系的正确性论证，并在Julia原型中实现了该方法。实验发现，在一系列模型和推理算法上，该方法在数据集大小上带来了渐近运行时的改进。