Smoothing Methods for Automatic Differentiation Across Conditional Branches

Programs involving discontinuities introduced by control flow constructs such as conditional branches pose challenges to mathematical optimization methods that assume a degree of smoothness in the objective function's response surface. Smooth interpretation (SI) is a form of abstract interpretation that approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner. Here, we combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs. In contrast to AD across a regular program execution, these gradients also capture the effects of alternative control flow paths. The combination of SI with AD enables the direct gradient-based parameter synthesis for branching programs, allowing for instance the calibration of simulation models or their combination with neural network models in machine learning pipelines. We detail the effects of the approximations made for tractability in SI and propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling. Using DiscoGrad, our tool for automatically translating simple C++ programs to a smooth differentiable form, we perform an extensive evaluation. We compare the combination of SI with AD and our Monte Carlo estimator to existing gradient-free and stochastic methods on four non-trivial and originally discontinuous problems ranging from classical simulation-based optimization to neural network-driven control. While the optimization progress with the SI-based estimator depends on the complexity of the programs' control flow, our Monte Carlo estimator is competitive in all problems, exhibiting the fastest convergence by a substantial margin in our highest-dimensional problem.

翻译：包含条件分支等控制流结构所引入的不连续性的程序，给那些假设目标函数响应曲面具有一定光滑度的数学优化方法带来了挑战。平滑解释（SI）是一种抽象解释形式，它通过近似程序输出与高斯核的卷积，以一种严谨的方式平滑程序输出。在此，我们将SI与自动微分（AD）相结合，以高效计算平滑程序的梯度。与常规程序执行中的AD不同，这些梯度还捕捉了替代控制流路径的影响。SI与AD的结合使得能够对分支程序进行直接基于梯度的参数合成，例如可用于校准仿真模型，或将其与机器学习流水线中的神经网络模型相结合。我们详细阐述了为了在SI中实现可操作性而进行的近似所产生的影响，并提出了一种新颖的蒙特卡罗估计器，该估计器通过结合AD和采样来估计平滑程序的梯度，从而避免了底层假设。利用我们的工具DiscoGrad（可自动将简单C++程序转换为平滑可微形式），我们进行了广泛评估。我们将SI与AD的结合以及我们的蒙特卡罗估计器，与现有的无梯度方法和随机方法进行了比较，涉及四个非平凡且原本不连续的问题，范围从经典的基于仿真的优化到神经网络驱动的控制。尽管基于SI的估计器的优化进展取决于程序控制流的复杂性，但我们的蒙特卡罗估计器在所有问题中都表现出竞争力，并在我们维数最高的问题中以显著优势实现了最快的收敛速度。