Smoothing Methods for Automatic Differentiation Across Conditional Branches

from arxiv, 21 pages, 17 figures, updated content to reflect journal version. Published in IEEE Access, available at https://ieeexplore.ieee.org/abstract/document/10356054

Programs involving discontinuities introduced by control flow constructs such as conditional branches pose challenges to mathematical optimization methods that assume a degree of smoothness in the objective function's response surface. Smooth interpretation (SI) is a form of abstract interpretation that approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner. Here, we combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs. In contrast to AD across a regular program execution, these gradients also capture the effects of alternative control flow paths. The combination of SI with AD enables the direct gradient-based parameter synthesis for branching programs, allowing for instance the calibration of simulation models or their combination with neural network models in machine learning pipelines. We detail the effects of the approximations made for tractability in SI and propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling. Using DiscoGrad, our tool for automatically translating simple C++ programs to a smooth differentiable form, we perform an extensive evaluation. We compare the combination of SI with AD and our Monte Carlo estimator to existing gradient-free and stochastic methods on four non-trivial and originally discontinuous problems ranging from classical simulation-based optimization to neural network-driven control. While the optimization progress with the SI-based estimator depends on the complexity of the program's control flow, our Monte Carlo estimator is competitive in all problems, exhibiting the fastest convergence by a substantial margin in our highest-dimensional problem.

翻译：包含由条件分支等控制流结构引入的不连续性的程序，给那些假设目标函数响应面具有一定光滑度的数学优化方法带来了挑战。平滑解释（SI）是一种抽象解释形式，它通过将程序输出与高斯核进行卷积来近似平滑输出，从而以严谨的方式实现输出平滑。在此，我们将SI与自动微分（AD）相结合，以高效计算平滑程序的梯度。与常规程序执行中的自动微分不同，这些梯度还捕捉了替代控制流路径的影响。SI与AD的结合使得能够直接对分支程序进行基于梯度的参数合成，例如可校准仿真模型或将其与机器学习流程中的神经网络模型相结合。我们详细阐述了SI中为实现可处理性而引入的近似效应，并提出了一种新颖的蒙特卡洛估计器，该估计器通过结合AD与采样来估计平滑程序的梯度，从而避免了底层假设。利用我们的工具DiscoGrad（可自动将简单C++程序转换为平滑可微形式），我们进行了广泛评估。我们将SI与AD的结合以及我们的蒙特卡洛估计器与现有的无梯度方法和随机方法在四个原本不连续的非平凡问题上进行了比较，这些问题涵盖从经典基于仿真的优化到神经网络驱动的控制。尽管基于SI的估计器的优化进展取决于程序控制流的复杂度，但我们的蒙特卡洛估计器在所有问题上均具有竞争力，并在最高维问题上以显著优势展现出最快的收敛速度。